SSL, Fine Tuning, and Linear Probing Heads

none of these work!

Linear Probing and Fine Tuning Heads

RNNProbingHead

 RNNProbingHead (c_in, input_size, hidden_size, n_classes,
                 contrastive=False, module='GRU', linear_dropout=0.0,
                 rnn_dropout=0.0, num_rnn_layers=1, act='gelu',
                 pool='average', temperature=2.0, n_linear_layers=1,
                 predict_every_n_patches=1, bidirectional=True,
                 affine=False, shared_embedding=True, augmentations=None,
                 augmentation_mask_ratio=0.0,
                 augmentation_dims_to_shuffle=[1, 2, 3], norm=None)

*Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
	Type	Default	Details
c_in
input_size
hidden_size
n_classes
contrastive	bool	False
module	str	GRU
linear_dropout	float	0.0
rnn_dropout	float	0.0
num_rnn_layers	int	1
act	str	gelu
pool	str	average	‘average’ or ‘max’ or ‘majority’
temperature	float	2.0	only used if pool=‘majority’
n_linear_layers	int	1
predict_every_n_patches	int	1
bidirectional	bool	True
affine	bool	False
shared_embedding	bool	True
augmentations	NoneType	None
augmentation_mask_ratio	float	0.0
augmentation_dims_to_shuffle	list	[1, 2, 3]
norm	NoneType	None	one of [None, ‘pre’, ‘post’]

source

RNNProbingHeadExperimental

 RNNProbingHeadExperimental (c_in, input_size, hidden_size, n_classes,
                             contrastive=False, module='GRU',
                             linear_dropout=0.0, rnn_dropout=0.0,
                             num_rnn_layers=1, act='gelu', pool='average',
                             temperature=2.0, predict_every_n_patches=1,
                             bidirectional=True, affine=False,
                             augmentations=None,
                             augmentation_mask_ratio=0.0,
                             augmentation_dims_to_shuffle=[1, 2, 3],
                             pre_norm=True, mlp_final_head=False)

*Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
	Type	Default	Details
c_in
input_size
hidden_size
n_classes
contrastive	bool	False	deprecated
module	str	GRU
linear_dropout	float	0.0
rnn_dropout	float	0.0
num_rnn_layers	int	1
act	str	gelu
pool	str	average	‘average’ or ‘max’ or ‘majority’
temperature	float	2.0	only used if pool=‘majority’
predict_every_n_patches	int	1
bidirectional	bool	True
affine	bool	False
augmentations	NoneType	None
augmentation_mask_ratio	float	0.0
augmentation_dims_to_shuffle	list	[1, 2, 3]
pre_norm	bool	True	one of [None, ‘pre’, ‘post’]
mlp_final_head	bool	False

m = RNNProbingHeadExperimental(c_in=7, 
                                pool='average', 
                                input_size = 384, 
                                bidirectional=True,
                                affine=False, 
                                hidden_size=1200,
                                module='GRU',
                                n_classes=4,
                                predict_every_n_patches=32,
                                rnn_dropout=0.,
                                num_rnn_layers=1,
                                linear_dropout=0.,
                                mlp_final_head=False,
                                pre_norm=True)
x = torch.randn((4,7,384,960))
sequence_padding_mask = torch.zeros(4,960)
sequence_padding_mask[:,-32:] = 1
m(x, return_softmax=True, sequence_padding_mask=sequence_padding_mask).shape

torch.Size([4, 4, 30])

m = RNNProbingHead(c_in=7, pool='majority', input_size = 384, contrastive=False, bidirectional=True, affine=True, shared_embedding=False, hidden_size=384, module='GRU', n_classes=4, predict_every_n_patches=32, rnn_dropout=0., num_rnn_layers=1, linear_dropout=0., n_linear_layers=1, norm='post')
x = torch.randn((4,7,384,960))

m(x, return_softmax=True).shape

torch.Size([4, 4, 30])

m = RNNProbingHead(c_in=7, input_size = 512, contrastive=True, bidirectional=True, affine=False, shared_embedding=True, hidden_size=256, module='GRU', n_classes=5, predict_every_n_patches=5, rnn_dropout=0., num_rnn_layers=1, linear_dropout=0., n_linear_layers=1)
x = torch.randn((4,7,512*2,3600))

m(x, return_softmax=True).shape

torch.Size([4, 5, 720])

source

TransformerDecoderProbingHead

 TransformerDecoderProbingHead (c_in, d_model, n_classes,
                                norm='BatchNorm', dropout=0.0, act='gelu',
                                d_ff=2048, num_layers=1, n_heads=2,
                                predict_every_n_patches=1, affine=False,
                                shared_embedding=True)

*Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*

layer = TransformerDecoderProbingHead(c_in=7, affine=True, shared_embedding=False, d_model=512, n_classes=5, dropout=0., num_layers=1, n_heads=2, predict_every_n_patches=5)
x = torch.randn((4, 7, 512, 3600))

layer(x).shape

torch.Size([4, 5, 720])

source

DecoderFeedForward

 DecoderFeedForward (c_in, predict_every_n_patches, num_layers, d_ff,
                     attn_dropout, res_attention, pre_norm, store_attn,
                     n_heads, shared_embedding, affine, n_classes,
                     d_model, norm='BatchNorm', act='gelu', dropout=0.0)

transformer decoder with attention for feedforward predictions. This is really just another encoder layer followed by a linear layer + 1d convolution + softmax. However, if used in linear probing, could be useful.

	Type	Default	Details
c_in			the number of input channels
predict_every_n_patches			for a given sequence of length m with frequency f, number of predictions
num_layers
d_ff
attn_dropout
res_attention
pre_norm
store_attn
n_heads
shared_embedding
affine
n_classes			the number of classes to predict (for sleep stage - there are 6)
d_model			the dimension of the transformer model
norm	str	BatchNorm	batchnorm or layernorm between linear and convolutional layers
act	str	gelu	activation function to use between layers, ‘gelu’ or ‘relu’
dropout	float	0.0	dropout in between linear layers

c_in = 7
frequency = 125
win_length=750 
overlap = 0.
hop_length=win_length - int(overlap*win_length)
max_seq_len_sec = (6*3600) # for dataloader
#seq_len_sec = sample_stride = 3*3600 # for dataloader
max_seq_len = max_seq_len_sec*frequency # for model
#n_patches = n_fft // 2 + 1
n_patches = (max(max_seq_len, win_length)-win_length) // hop_length + 1

#patch_len = int((win_length-conv_kernel_stride_size[1])/conv_kernel_stride_size[1] + 1)
x = torch.randn(2,c_in,512,n_patches)

model = DecoderFeedForward(c_in=c_in,
                           predict_every_n_patches=5,
                           num_layers=1,
                           d_ff = 2048,
                           attn_dropout=0.,
                           res_attention = False,
                           pre_norm = False,
                           store_attn = False,
                           n_heads=2,
                           affine=False,
                           shared_embedding=False,
                           n_classes=5,
                           d_model=512,
                           norm='BatchNorm',
                           act='gelu',
                           dropout=0.
                           )

model(x).shape

torch.Size([2, 5, 720])

source

TimeDistributedConvolutionalFeedForward

 TimeDistributedConvolutionalFeedForward (c_in, frequency,
                                          predict_every_seconds,
                                          n_classes, win_length, d_model,
                                          affine=False,
                                          shared_embedding=True)

Convolutional feed forward head that first uses a linear feed forward network to project features into the original convolutional dimension. Then, a convolutional transpose is used to extrapolate the data to its original form. Finally, a final convolution is used to predict the classes.

	Type	Default	Details
c_in			the number of input channels
frequency			the frequency of the original channels
predict_every_seconds			for a given sequence of length m with frequency f, number of predictions
n_classes			the number of classes to predict (for sleep stage - there are 6)
win_length			the convolved patch length, the first step in this is to do a linear layer to this dimension
d_model			the dimension of the transformer model
affine	bool	False
shared_embedding	bool	True

source

LinearProbingHead

 LinearProbingHead (c_in, predict_every_n_patches, n_classes, input_size,
                    n_layers, num_patch, shared_embedding=True,
                    affine=True, norm='BatchNorm', act='gelu',
                    dropout=0.0)

A linear probing head (with optional MLP), assumes that the d_model corresponds to a particular segment of time and will make a prediction per patch per channel, and average the results

	Type	Default	Details
c_in			the number of input channels in the original input
predict_every_n_patches			for a given sequence of length m with frequency f, number of predictions
n_classes			the number of classes to predict (for sleep stage - there are 6)
input_size			the dimension of the transformer model
n_layers			the number of linear layers to use in the prediciton head, with RELU activation and dropout
num_patch
shared_embedding	bool	True	whether or not to have a dense layer per channel or one layer per channel
affine	bool	True	include learnable parameters to weight predictions
norm	str	BatchNorm	batchnorm or layernorm between linear and convolutional layers
act	str	gelu	activation function to use between layers, ‘gelu’ or ‘relu’
dropout	float	0.0	dropout in between linear layers

m = LinearProbingHead(c_in=7, 
                      input_size = 512, 
                      predict_every_n_patches=5,
                      n_classes=5,
                      n_layers=3,
                      shared_embedding=True,
                      affine=True,
                      num_patch=3600,
                      dropout=0.1)

x = torch.randn((4,7,512,3600))

m(x, return_softmax=True).shape

torch.Size([4, 5, 720])

source

TimeDistributedFeedForward

 TimeDistributedFeedForward (c_in, n_classes, n_patches, d_model,
                             pred_len_seconds, n_linear_layers,
                             conv_kernel_stride_size, dropout=0.0)

Feed forward head that uses a convolutional layer to reduce channel dimensionality Followed by a feedforward network to make

	Type	Default	Details
c_in			the number of input channels
n_classes			the number of classes to predict (for sleep stage - there are 6)
n_patches			the number of stft or time patches
d_model			the dimension of the transformer model
pred_len_seconds			the sequence multiclass prediction length in seconds
n_linear_layers			the number of linear layers to use in the prediciton head, with RELU activation and dropout
conv_kernel_stride_size			the 1d convolution kernel size and stride, in seconds. If you want every 30 second predicitons, put 30 here.
dropout	float	0.0	dropout in between linear layers

source

ConvBiGRU

 ConvBiGRU (input_size, hidden_sizes, kernel_sizes, n_layers, d_model,
            predict_every_n_patches, n_classes)

*Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*

source

ConvGRU1D

 ConvGRU1D (input_size, hidden_sizes, kernel_sizes, n_layers)

*Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*

source

ConvGRU1DCell

 ConvGRU1DCell (input_size, hidden_size, kernel_size)

Generate a convolutional GRU cell

x = torch.randn((4,7,512,3600))

convgru = ConvBiGRU(input_size=7, hidden_sizes=32, kernel_sizes=3, n_layers=1, d_model=512, predict_every_n_patches=5, n_classes=5)

out = convgru(x)
out.shape

torch.Size([4, 5, 720])