Linear Probing and Fine Tuning Heads
source
RNNProbingHead
RNNProbingHead (c_in, input_size, hidden_size, n_classes,
contrastive=False, module='GRU', linear_dropout=0.0,
rnn_dropout=0.0, num_rnn_layers=1, act='gelu',
pool='average', temperature=2.0, n_linear_layers=1,
predict_every_n_patches=1, bidirectional=True,
affine=False, shared_embedding=True, augmentations=None,
augmentation_mask_ratio=0.0,
augmentation_dims_to_shuffle=[1, 2, 3], norm=None)
*Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to
, etc.
.. note:: As per the example above, an __init__()
call to the parent class must be made before assignment on the child.
ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
c_in
input_size
hidden_size
n_classes
contrastive
bool
False
module
str
GRU
linear_dropout
float
0.0
rnn_dropout
float
0.0
num_rnn_layers
int
1
act
str
gelu
pool
str
average
‘average’ or ‘max’ or ‘majority’
temperature
float
2.0
only used if pool=‘majority’
n_linear_layers
int
1
predict_every_n_patches
int
1
bidirectional
bool
True
affine
bool
False
shared_embedding
bool
True
augmentations
NoneType
None
augmentation_mask_ratio
float
0.0
augmentation_dims_to_shuffle
list
[1, 2, 3]
norm
NoneType
None
one of [None, ‘pre’, ‘post’]
source
RNNProbingHeadExperimental
RNNProbingHeadExperimental (c_in, input_size, hidden_size, n_classes,
contrastive=False, module='GRU',
linear_dropout=0.0, rnn_dropout=0.0,
num_rnn_layers=1, act='gelu', pool='average',
temperature=2.0, predict_every_n_patches=1,
bidirectional=True, affine=False,
augmentations=None,
augmentation_mask_ratio=0.0,
augmentation_dims_to_shuffle=[1, 2, 3],
pre_norm=True, mlp_final_head=False)
*Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to
, etc.
.. note:: As per the example above, an __init__()
call to the parent class must be made before assignment on the child.
ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
c_in
input_size
hidden_size
n_classes
contrastive
bool
False
deprecated
module
str
GRU
linear_dropout
float
0.0
rnn_dropout
float
0.0
num_rnn_layers
int
1
act
str
gelu
pool
str
average
‘average’ or ‘max’ or ‘majority’
temperature
float
2.0
only used if pool=‘majority’
predict_every_n_patches
int
1
bidirectional
bool
True
affine
bool
False
augmentations
NoneType
None
augmentation_mask_ratio
float
0.0
augmentation_dims_to_shuffle
list
[1, 2, 3]
pre_norm
bool
True
one of [None, ‘pre’, ‘post’]
mlp_final_head
bool
False
m = RNNProbingHeadExperimental(c_in= 7 ,
pool= 'average' ,
input_size = 384 ,
bidirectional= True ,
affine= False ,
hidden_size= 1200 ,
module= 'GRU' ,
n_classes= 4 ,
predict_every_n_patches= 32 ,
rnn_dropout= 0. ,
num_rnn_layers= 1 ,
linear_dropout= 0. ,
mlp_final_head= False ,
pre_norm= True )
x = torch.randn((4 ,7 ,384 ,960 ))
sequence_padding_mask = torch.zeros(4 ,960 )
sequence_padding_mask[:,- 32 :] = 1
m(x, return_softmax= True , sequence_padding_mask= sequence_padding_mask).shape
m = RNNProbingHead(c_in= 7 , pool= 'majority' , input_size = 384 , contrastive= False , bidirectional= True , affine= True , shared_embedding= False , hidden_size= 384 , module= 'GRU' , n_classes= 4 , predict_every_n_patches= 32 , rnn_dropout= 0. , num_rnn_layers= 1 , linear_dropout= 0. , n_linear_layers= 1 , norm= 'post' )
x = torch.randn((4 ,7 ,384 ,960 ))
m(x, return_softmax= True ).shape
m = RNNProbingHead(c_in= 7 , input_size = 512 , contrastive= True , bidirectional= True , affine= False , shared_embedding= True , hidden_size= 256 , module= 'GRU' , n_classes= 5 , predict_every_n_patches= 5 , rnn_dropout= 0. , num_rnn_layers= 1 , linear_dropout= 0. , n_linear_layers= 1 )
x = torch.randn((4 ,7 ,512 * 2 ,3600 ))
m(x, return_softmax= True ).shape
source
DecoderFeedForward
DecoderFeedForward (c_in, predict_every_n_patches, num_layers, d_ff,
attn_dropout, res_attention, pre_norm, store_attn,
n_heads, shared_embedding, affine, n_classes,
d_model, norm='BatchNorm', act='gelu', dropout=0.0)
transformer decoder with attention for feedforward predictions. This is really just another encoder layer followed by a linear layer + 1d convolution + softmax. However, if used in linear probing, could be useful.
c_in
the number of input channels
predict_every_n_patches
for a given sequence of length m with frequency f, number of predictions
num_layers
d_ff
attn_dropout
res_attention
pre_norm
store_attn
n_heads
shared_embedding
affine
n_classes
the number of classes to predict (for sleep stage - there are 6)
d_model
the dimension of the transformer model
norm
str
BatchNorm
batchnorm or layernorm between linear and convolutional layers
act
str
gelu
activation function to use between layers, ‘gelu’ or ‘relu’
dropout
float
0.0
dropout in between linear layers
c_in = 7
frequency = 125
win_length= 750
overlap = 0.
hop_length= win_length - int (overlap* win_length)
max_seq_len_sec = (6 * 3600 ) # for dataloader
#seq_len_sec = sample_stride = 3*3600 # for dataloader
max_seq_len = max_seq_len_sec* frequency # for model
#n_patches = n_fft // 2 + 1
n_patches = (max (max_seq_len, win_length)- win_length) // hop_length + 1
#patch_len = int((win_length-conv_kernel_stride_size[1])/conv_kernel_stride_size[1] + 1)
x = torch.randn(2 ,c_in,512 ,n_patches)
model = DecoderFeedForward(c_in= c_in,
predict_every_n_patches= 5 ,
num_layers= 1 ,
d_ff = 2048 ,
attn_dropout= 0. ,
res_attention = False ,
pre_norm = False ,
store_attn = False ,
n_heads= 2 ,
affine= False ,
shared_embedding= False ,
n_classes= 5 ,
d_model= 512 ,
norm= 'BatchNorm' ,
act= 'gelu' ,
dropout= 0.
)
model(x).shape
source
TimeDistributedConvolutionalFeedForward
TimeDistributedConvolutionalFeedForward (c_in, frequency,
predict_every_seconds,
n_classes, win_length, d_model,
affine=False,
shared_embedding=True)
Convolutional feed forward head that first uses a linear feed forward network to project features into the original convolutional dimension. Then, a convolutional transpose is used to extrapolate the data to its original form. Finally, a final convolution is used to predict the classes.
c_in
the number of input channels
frequency
the frequency of the original channels
predict_every_seconds
for a given sequence of length m with frequency f, number of predictions
n_classes
the number of classes to predict (for sleep stage - there are 6)
win_length
the convolved patch length, the first step in this is to do a linear layer to this dimension
d_model
the dimension of the transformer model
affine
bool
False
shared_embedding
bool
True
source
LinearProbingHead
LinearProbingHead (c_in, predict_every_n_patches, n_classes, input_size,
n_layers, num_patch, shared_embedding=True,
affine=True, norm='BatchNorm', act='gelu',
dropout=0.0)
A linear probing head (with optional MLP), assumes that the d_model corresponds to a particular segment of time and will make a prediction per patch per channel, and average the results
c_in
the number of input channels in the original input
predict_every_n_patches
for a given sequence of length m with frequency f, number of predictions
n_classes
the number of classes to predict (for sleep stage - there are 6)
input_size
the dimension of the transformer model
n_layers
the number of linear layers to use in the prediciton head, with RELU activation and dropout
num_patch
shared_embedding
bool
True
whether or not to have a dense layer per channel or one layer per channel
affine
bool
True
include learnable parameters to weight predictions
norm
str
BatchNorm
batchnorm or layernorm between linear and convolutional layers
act
str
gelu
activation function to use between layers, ‘gelu’ or ‘relu’
dropout
float
0.0
dropout in between linear layers
m = LinearProbingHead(c_in= 7 ,
input_size = 512 ,
predict_every_n_patches= 5 ,
n_classes= 5 ,
n_layers= 3 ,
shared_embedding= True ,
affine= True ,
num_patch= 3600 ,
dropout= 0.1 )
x = torch.randn((4 ,7 ,512 ,3600 ))
m(x, return_softmax= True ).shape
source
TimeDistributedFeedForward
TimeDistributedFeedForward (c_in, n_classes, n_patches, d_model,
pred_len_seconds, n_linear_layers,
conv_kernel_stride_size, dropout=0.0)
Feed forward head that uses a convolutional layer to reduce channel dimensionality Followed by a feedforward network to make
c_in
the number of input channels
n_classes
the number of classes to predict (for sleep stage - there are 6)
n_patches
the number of stft or time patches
d_model
the dimension of the transformer model
pred_len_seconds
the sequence multiclass prediction length in seconds
n_linear_layers
the number of linear layers to use in the prediciton head, with RELU activation and dropout
conv_kernel_stride_size
the 1d convolution kernel size and stride, in seconds. If you want every 30 second predicitons, put 30 here.
dropout
float
0.0
dropout in between linear layers
source
ConvBiGRU
ConvBiGRU (input_size, hidden_sizes, kernel_sizes, n_layers, d_model,
predict_every_n_patches, n_classes)
*Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to
, etc.
.. note:: As per the example above, an __init__()
call to the parent class must be made before assignment on the child.
:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
source
ConvGRU1D
ConvGRU1D (input_size, hidden_sizes, kernel_sizes, n_layers)
*Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to
, etc.
.. note:: As per the example above, an __init__()
call to the parent class must be made before assignment on the child.
:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
source
ConvGRU1DCell
ConvGRU1DCell (input_size, hidden_size, kernel_size)
Generate a convolutional GRU cell
x = torch.randn((4 ,7 ,512 ,3600 ))
convgru = ConvBiGRU(input_size= 7 , hidden_sizes= 32 , kernel_sizes= 3 , n_layers= 1 , d_model= 512 , predict_every_n_patches= 5 , n_classes= 5 )
out = convgru(x)
out.shape