SSL, Fine Tuning, and Linear Probing Heads

none of these work!

Linear Probing and Fine Tuning Heads


source

RNNProbingHead

 RNNProbingHead (c_in, input_size, hidden_size, n_classes,
                 contrastive=False, module='GRU', linear_dropout=0.0,
                 rnn_dropout=0.0, num_rnn_layers=1, act='gelu',
                 pool='average', temperature=2.0, n_linear_layers=1,
                 predict_every_n_patches=1, bidirectional=True,
                 affine=False, shared_embedding=True, augmentations=None,
                 augmentation_mask_ratio=0.0,
                 augmentation_dims_to_shuffle=[1, 2, 3], norm=None)

*Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
Type Default Details
c_in
input_size
hidden_size
n_classes
contrastive bool False
module str GRU
linear_dropout float 0.0
rnn_dropout float 0.0
num_rnn_layers int 1
act str gelu
pool str average ‘average’ or ‘max’ or ‘majority’
temperature float 2.0 only used if pool=‘majority’
n_linear_layers int 1
predict_every_n_patches int 1
bidirectional bool True
affine bool False
shared_embedding bool True
augmentations NoneType None
augmentation_mask_ratio float 0.0
augmentation_dims_to_shuffle list [1, 2, 3]
norm NoneType None one of [None, ‘pre’, ‘post’]

source

RNNProbingHeadExperimental

 RNNProbingHeadExperimental (c_in, input_size, hidden_size, n_classes,
                             contrastive=False, module='GRU',
                             linear_dropout=0.0, rnn_dropout=0.0,
                             num_rnn_layers=1, act='gelu', pool='average',
                             temperature=2.0, predict_every_n_patches=1,
                             bidirectional=True, affine=False,
                             augmentations=None,
                             augmentation_mask_ratio=0.0,
                             augmentation_dims_to_shuffle=[1, 2, 3],
                             pre_norm=True, mlp_final_head=False)

*Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
Type Default Details
c_in
input_size
hidden_size
n_classes
contrastive bool False deprecated
module str GRU
linear_dropout float 0.0
rnn_dropout float 0.0
num_rnn_layers int 1
act str gelu
pool str average ‘average’ or ‘max’ or ‘majority’
temperature float 2.0 only used if pool=‘majority’
predict_every_n_patches int 1
bidirectional bool True
affine bool False
augmentations NoneType None
augmentation_mask_ratio float 0.0
augmentation_dims_to_shuffle list [1, 2, 3]
pre_norm bool True one of [None, ‘pre’, ‘post’]
mlp_final_head bool False
m = RNNProbingHeadExperimental(c_in=7, 
                                pool='average', 
                                input_size = 384, 
                                bidirectional=True,
                                affine=False, 
                                hidden_size=1200,
                                module='GRU',
                                n_classes=4,
                                predict_every_n_patches=32,
                                rnn_dropout=0.,
                                num_rnn_layers=1,
                                linear_dropout=0.,
                                mlp_final_head=False,
                                pre_norm=True)
x = torch.randn((4,7,384,960))
sequence_padding_mask = torch.zeros(4,960)
sequence_padding_mask[:,-32:] = 1
m(x, return_softmax=True, sequence_padding_mask=sequence_padding_mask).shape
torch.Size([4, 4, 30])
m = RNNProbingHead(c_in=7, pool='majority', input_size = 384, contrastive=False, bidirectional=True, affine=True, shared_embedding=False, hidden_size=384, module='GRU', n_classes=4, predict_every_n_patches=32, rnn_dropout=0., num_rnn_layers=1, linear_dropout=0., n_linear_layers=1, norm='post')
x = torch.randn((4,7,384,960))

m(x, return_softmax=True).shape
torch.Size([4, 4, 30])
m = RNNProbingHead(c_in=7, input_size = 512, contrastive=True, bidirectional=True, affine=False, shared_embedding=True, hidden_size=256, module='GRU', n_classes=5, predict_every_n_patches=5, rnn_dropout=0., num_rnn_layers=1, linear_dropout=0., n_linear_layers=1)
x = torch.randn((4,7,512*2,3600))

m(x, return_softmax=True).shape
torch.Size([4, 5, 720])

source

TransformerDecoderProbingHead

 TransformerDecoderProbingHead (c_in, d_model, n_classes,
                                norm='BatchNorm', dropout=0.0, act='gelu',
                                d_ff=2048, num_layers=1, n_heads=2,
                                predict_every_n_patches=1, affine=False,
                                shared_embedding=True)

*Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*

layer = TransformerDecoderProbingHead(c_in=7, affine=True, shared_embedding=False, d_model=512, n_classes=5, dropout=0., num_layers=1, n_heads=2, predict_every_n_patches=5)
x = torch.randn((4, 7, 512, 3600))

layer(x).shape
torch.Size([4, 5, 720])

source

DecoderFeedForward

 DecoderFeedForward (c_in, predict_every_n_patches, num_layers, d_ff,
                     attn_dropout, res_attention, pre_norm, store_attn,
                     n_heads, shared_embedding, affine, n_classes,
                     d_model, norm='BatchNorm', act='gelu', dropout=0.0)

transformer decoder with attention for feedforward predictions. This is really just another encoder layer followed by a linear layer + 1d convolution + softmax. However, if used in linear probing, could be useful.

Type Default Details
c_in the number of input channels
predict_every_n_patches for a given sequence of length m with frequency f, number of predictions
num_layers
d_ff
attn_dropout
res_attention
pre_norm
store_attn
n_heads
shared_embedding
affine
n_classes the number of classes to predict (for sleep stage - there are 6)
d_model the dimension of the transformer model
norm str BatchNorm batchnorm or layernorm between linear and convolutional layers
act str gelu activation function to use between layers, ‘gelu’ or ‘relu’
dropout float 0.0 dropout in between linear layers
c_in = 7
frequency = 125
win_length=750 
overlap = 0.
hop_length=win_length - int(overlap*win_length)
max_seq_len_sec = (6*3600) # for dataloader
#seq_len_sec = sample_stride = 3*3600 # for dataloader
max_seq_len = max_seq_len_sec*frequency # for model
#n_patches = n_fft // 2 + 1
n_patches = (max(max_seq_len, win_length)-win_length) // hop_length + 1

#patch_len = int((win_length-conv_kernel_stride_size[1])/conv_kernel_stride_size[1] + 1)
x = torch.randn(2,c_in,512,n_patches)

model = DecoderFeedForward(c_in=c_in,
                           predict_every_n_patches=5,
                           num_layers=1,
                           d_ff = 2048,
                           attn_dropout=0.,
                           res_attention = False,
                           pre_norm = False,
                           store_attn = False,
                           n_heads=2,
                           affine=False,
                           shared_embedding=False,
                           n_classes=5,
                           d_model=512,
                           norm='BatchNorm',
                           act='gelu',
                           dropout=0.
                           )

model(x).shape
torch.Size([2, 5, 720])

source

TimeDistributedConvolutionalFeedForward

 TimeDistributedConvolutionalFeedForward (c_in, frequency,
                                          predict_every_seconds,
                                          n_classes, win_length, d_model,
                                          affine=False,
                                          shared_embedding=True)

Convolutional feed forward head that first uses a linear feed forward network to project features into the original convolutional dimension. Then, a convolutional transpose is used to extrapolate the data to its original form. Finally, a final convolution is used to predict the classes.

Type Default Details
c_in the number of input channels
frequency the frequency of the original channels
predict_every_seconds for a given sequence of length m with frequency f, number of predictions
n_classes the number of classes to predict (for sleep stage - there are 6)
win_length the convolved patch length, the first step in this is to do a linear layer to this dimension
d_model the dimension of the transformer model
affine bool False
shared_embedding bool True

source

LinearProbingHead

 LinearProbingHead (c_in, predict_every_n_patches, n_classes, input_size,
                    n_layers, num_patch, shared_embedding=True,
                    affine=True, norm='BatchNorm', act='gelu',
                    dropout=0.0)

A linear probing head (with optional MLP), assumes that the d_model corresponds to a particular segment of time and will make a prediction per patch per channel, and average the results

Type Default Details
c_in the number of input channels in the original input
predict_every_n_patches for a given sequence of length m with frequency f, number of predictions
n_classes the number of classes to predict (for sleep stage - there are 6)
input_size the dimension of the transformer model
n_layers the number of linear layers to use in the prediciton head, with RELU activation and dropout
num_patch
shared_embedding bool True whether or not to have a dense layer per channel or one layer per channel
affine bool True include learnable parameters to weight predictions
norm str BatchNorm batchnorm or layernorm between linear and convolutional layers
act str gelu activation function to use between layers, ‘gelu’ or ‘relu’
dropout float 0.0 dropout in between linear layers
m = LinearProbingHead(c_in=7, 
                      input_size = 512, 
                      predict_every_n_patches=5,
                      n_classes=5,
                      n_layers=3,
                      shared_embedding=True,
                      affine=True,
                      num_patch=3600,
                      dropout=0.1)

x = torch.randn((4,7,512,3600))

m(x, return_softmax=True).shape
torch.Size([4, 5, 720])

source

TimeDistributedFeedForward

 TimeDistributedFeedForward (c_in, n_classes, n_patches, d_model,
                             pred_len_seconds, n_linear_layers,
                             conv_kernel_stride_size, dropout=0.0)

Feed forward head that uses a convolutional layer to reduce channel dimensionality Followed by a feedforward network to make

Type Default Details
c_in the number of input channels
n_classes the number of classes to predict (for sleep stage - there are 6)
n_patches the number of stft or time patches
d_model the dimension of the transformer model
pred_len_seconds the sequence multiclass prediction length in seconds
n_linear_layers the number of linear layers to use in the prediciton head, with RELU activation and dropout
conv_kernel_stride_size the 1d convolution kernel size and stride, in seconds. If you want every 30 second predicitons, put 30 here.
dropout float 0.0 dropout in between linear layers

source

ConvBiGRU

 ConvBiGRU (input_size, hidden_sizes, kernel_sizes, n_layers, d_model,
            predict_every_n_patches, n_classes)

*Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*


source

ConvGRU1D

 ConvGRU1D (input_size, hidden_sizes, kernel_sizes, n_layers)

*Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*


source

ConvGRU1DCell

 ConvGRU1DCell (input_size, hidden_size, kernel_size)

Generate a convolutional GRU cell

x = torch.randn((4,7,512,3600))

convgru = ConvBiGRU(input_size=7, hidden_sizes=32, kernel_sizes=3, n_layers=1, d_model=512, predict_every_n_patches=5, n_classes=5)

out = convgru(x)
out.shape
torch.Size([4, 5, 720])