SSL, Fine Tuning, and Linear Probing Heads

none of these work!

Linear Probing and Fine Tuning Heads

RNNProbingHead


def RNNProbingHead(
    c_in, input_size, hidden_size, n_classes, missing_channel_indices:NoneType=None, module:str='GRU',
    rnn_dropout:float=0.0, num_rnn_layers:int=1, pool:str='average', # 'average' or 'max' or 'majority'
    predict_every_n_patches:int=1, bidirectional:bool=True, affine:bool=False, pre_norm:bool=True,
    mlp_final_head:bool=False, linear_dropout:float=0.0
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

m = RNNProbingHead(c_in=7, 
                    pool='average', 
                    input_size = 384, 
                    missing_channel_indices=None,#,[0,1],
                    bidirectional=True,
                    affine=True, 
                    hidden_size=1200,
                    module='GRU',
                    n_classes=5,
                    predict_every_n_patches=5,
                    rnn_dropout=0.,
                    num_rnn_layers=1,
                    mlp_final_head=True,
                    pre_norm=True)
x = torch.randn((4*7,960,384))
m(x, return_softmax=True).shape

torch.Size([4, 5, 192])

batch_size = 2
n_vars = 7
max_len = 480
d_model = 384

# Create sequences of different lengths
seq_lens = torch.randint(50, max_len, (batch_size,))

# Create input tensors with different sequence lengths
x_list = [torch.randn(n_vars, length, d_model) for length in seq_lens]
x_nested = torch.nested.as_nested_tensor(x_list, layout=torch.jagged)
x_nested = flatten_dim_to_batch(x_nested, dim=1)  # flatten n_vars into batch dimension
print(x_nested.shape)
y = m(x_nested, return_softmax=False)
y.shape
# #from wavefm.loss import mse_loss

# #y = y.mean(dim=1)
# print(y.shape)
# mse_loss(y, torch.ones_like(y)).backward()

torch.Size([14, j2, 384])

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[6], line 15
     13 x_nested = flatten_dim_to_batch(x_nested, dim=1)  # flatten n_vars into batch dimension
     14 print(x_nested.shape)
---> 15 y = m(x_nested, return_softmax=False)
     16 y.shape

NameError: name 'm' is not defined

source

AttentionRNNProbingHead


def AttentionRNNProbingHead(
    c_in, input_size, hidden_size, n_classes, attention_dim:int=512, module:str='GRU', rnn_dropout:float=0.0,
    num_rnn_layers:int=1, predict_every_n_patches:int=1, bidirectional:bool=True, affine:bool=False,
    shared_embedding:bool=True, augmentations:NoneType=None, augmentation_mask_ratio:float=0.0,
    augmentation_dims_to_shuffle:list=[1, 2, 3], pool:str='average', # 'average' or 'max'
    norm:NoneType=None, # one of [None, 'pre', 'post']
    attn_type:str='single', # 'multi' or 'single'
    n_heads:int=2, # only used if attn_type='multi'
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

source

SingleAttentionLayer


def SingleAttentionLayer(
    hidden_size, attention_dim:int=512
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

m = AttentionRNNProbingHead(c_in=7, input_size = 512, bidirectional=True, affine=True, shared_embedding=False, hidden_size=384, module='GRU', n_classes=2, predict_every_n_patches=960, rnn_dropout=0., num_rnn_layers=1, norm='post', attn_type='multi')
x = torch.randn((4,7,512,960))

m(x, return_softmax=True).shape

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[10], line 2
      1 #| notest
----> 2 m = AttentionRNNProbingHead(c_in=7, input_size = 512, bidirectional=True, affine=True, shared_embedding=False, hidden_size=384, module='GRU', n_classes=2, predict_every_n_patches=960, rnn_dropout=0., num_rnn_layers=1, norm='post', attn_type='multi')
      3 x = torch.randn((4,7,512,960))
      5 m(x, return_softmax=True).shape

Cell In[9], line 83, in AttentionRNNProbingHead.__init__(self, c_in, input_size, hidden_size, n_classes, attention_dim, module, rnn_dropout, num_rnn_layers, predict_every_n_patches, bidirectional, affine, shared_embedding, augmentations, augmentation_mask_ratio, augmentation_dims_to_shuffle, pool, norm, attn_type, n_heads)
     81     self.attn = SingleAttentionLayer(hidden_size=attention_dim, attention_dim=hidden_size)
     82 elif attn_type == 'multi':
---> 83     self.attn = MultiheadAttention(d_model=attention_dim, n_heads=n_heads)
     84 else:
     85     self.attn = nn.Identity()

NameError: name 'MultiheadAttention' is not defined

source

TransformerDecoderProbingHead


def TransformerDecoderProbingHead(
    c_in, d_model, n_classes, norm:str='BatchNorm', dropout:float=0.0, act:str='gelu', d_ff:int=2048,
    num_layers:int=1, n_heads:int=2, predict_every_n_patches:int=1, affine:bool=False, shared_embedding:bool=True
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

layer = TransformerDecoderProbingHead(c_in=7, affine=True, norm='LayerNorm', shared_embedding=False, d_model=512, n_classes=2, dropout=0., num_layers=1, n_heads=2, predict_every_n_patches=3600)
x = torch.randn((4, 7, 512, 3600))

layer(x).shape

torch.Size([4, 2, 1])

source

DecoderFeedForward


def DecoderFeedForward(
    c_in, # the number of input channels
    predict_every_n_patches, # for a given sequence of length m with frequency f, number of predictions
    num_layers, d_ff, attn_dropout, res_attention, pre_norm, store_attn, n_heads, shared_embedding, affine,
    n_classes, # the number of classes to predict (for sleep stage - there are 6)
    d_model, # the dimension of the transformer model
    norm:str='BatchNorm', # batchnorm or layernorm between linear and convolutional layers
    act:str='gelu', # activation function to use between layers, 'gelu' or 'relu'
    dropout:float=0.0, # dropout in between linear layers
):

transformer decoder with attention for feedforward predictions. This is really just another encoder layer followed by a linear layer + 1d convolution + softmax. However, if used in linear probing, could be useful.

c_in = 7
frequency = 125
win_length=750 
overlap = 0.
hop_length=win_length - int(overlap*win_length)
max_seq_len_sec = (6*3600) # for dataloader
#seq_len_sec = sample_stride = 3*3600 # for dataloader
max_seq_len = max_seq_len_sec*frequency # for model
#n_patches = n_fft // 2 + 1
n_patches = (max(max_seq_len, win_length)-win_length) // hop_length + 1

#patch_len = int((win_length-conv_kernel_stride_size[1])/conv_kernel_stride_size[1] + 1)
x = torch.randn(2,c_in,512,n_patches)

model = DecoderFeedForward(c_in=c_in,
                           predict_every_n_patches=5,
                           num_layers=1,
                           d_ff = 2048,
                           attn_dropout=0.,
                           res_attention = False,
                           pre_norm = False,
                           store_attn = False,
                           n_heads=2,
                           affine=False,
                           shared_embedding=False,
                           n_classes=5,
                           d_model=512,
                           norm='BatchNorm',
                           act='gelu',
                           dropout=0.
                           )

model(x).shape

torch.Size([2, 5, 720])

source

ConvolutionalClassifier


def ConvolutionalClassifier(
    c_in, # the number of input channels
    frequency, # the frequency of the original channels
    predict_every_seconds, # for a given sequence of length m with frequency f, number of predictions
    n_classes, # the number of classes to predict (for sleep stage - there are 6)
    win_length, # the convolved patch length, the first step in this is to do a linear layer to this dimension
    d_model, # the dimension of the transformer model
    affine:bool=False, shared_embedding:bool=True
):

source

TimeDistributedConvolutionalFeedForward


def TimeDistributedConvolutionalFeedForward(
    c_in, # the number of input channels
    kernel_size, # for a given sequence of length m with frequency f, number of predictions
    n_classes, # the number of classes to predict (for sleep stage - there are 6)
    d_model, # the dimension of the transformer model
    affine:bool=False, dropout:float=0.0, shared_embedding:bool=True
):

Convolutional feed forward head that first uses a linear feed forward network to project features into the original convolutional dimension. Then, a convolutional transpose is used to extrapolate the data to its original form. Finally, a final convolution is used to predict the classes.

c_in = 7
frequency = 125
win_length=750 
overlap = 0.
hop_length=win_length - int(overlap*win_length)
max_seq_len_sec = (8*3600) # for dataloader
#seq_len_sec = sample_stride = 3*3600 # for dataloader
max_seq_len = max_seq_len_sec*frequency # for model
#n_patches = n_fft // 2 + 1
n_patches = (max(max_seq_len, win_length)-win_length) // hop_length + 1

#patch_len = int((win_length-conv_kernel_stride_size[1])/conv_kernel_stride_size[1] + 1)
x = torch.randn(4,7,512,n_patches)

model = TimeDistributedConvolutionalFeedForward(c_in=7, 
                                                kernel_size=20, 
                                                n_classes=2, 
                                                d_model=512,
                                                shared_embedding=True,
                                                affine=True
                                                )

print(model)
model(x).shape

TimeDistributedConvolutionalFeedForward(
  (pool): AdaptiveAvgPool1d(output_size=1)
  (layers): Sequential(
    (0): Linear(in_features=512, out_features=256, bias=True)
  )
  (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
  (gelu): GELU(approximate='none')
  (dropout): Dropout(p=0.0, inplace=False)
  (flatten): Flatten(start_dim=-2, end_dim=-1)
  (conv1d): Conv1d(7, 2, kernel_size=(20,), stride=(1,))
  (softmax): Softmax(dim=1)
)

torch.Size([4, 2, 1])

source

HierarchicalInceptionTime


def HierarchicalInceptionTime(
    c_in, n_classes, n_levels, dropout:float=0.1, bottleneck_channels:int=32, groups:int=1, kwargs:VAR_KEYWORD
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

source

InceptionTime


def InceptionTime(
    c_in, n_classes, n_levels:NoneType=None, dropout:float=0.0, bottleneck_channels:int=32, groups:int=1,
    kwargs:VAR_KEYWORD
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

in_ = torch.randn(4,7,512,10)

mod = InceptionTime(c_in=7, n_classes=1, depth=2, residual=True, bottleneck=True, kernel_size=40)

mod(in_).shape

torch.Size([4, 1])

in_ = torch.randn(4,7,512,10)

mod = HierarchicalInceptionTime(c_in=7, n_classes=1, n_levels=3, bottleneck_channels=32, groups=1, depth=2, residual=True, bottleneck=True, kernel_size=40)

out = mod(in_)

ce_loss = nn.BCEWithLogitsLoss()
print(out.shape)
target = torch.empty(4,3).random_(2)
print(target.shape)
loss = 0
for i in range(3):
    loss += ce_loss(out[:,i], target[:,i])
loss

torch.Size([4, 3])
torch.Size([4, 3])

tensor(2.0436, grad_fn=<AddBackward0>)

in_ = torch.randn(2,7,360)
b = InceptionBlock(in_channels=7, kernel_size=24, bottleneck_channels=8, bottleneck=False, residual=False, depth=3)
b(in_).shape

torch.Size([2, 32, 360])

source

LinearProbingHead


def LinearProbingHead(
    c_in, # the number of input channels in the original input
    predict_every_n_patches, # for a given sequence of length m with frequency f, number of predictions
    n_classes, # the number of classes to predict (for sleep stage - there are 6)
    input_size, # the dimension of the transformer model
    n_layers, # the number of linear layers to use in the prediciton head, with RELU activation and dropout
    num_patch,
    shared_embedding:bool=True, # whether or not to have a dense layer per channel or one layer per channel
    affine:bool=True, # include learnable parameters to weight predictions
    norm:str='BatchNorm', # batchnorm or layernorm between linear and convolutional layers
    act:str='gelu', # activation function to use between layers, 'gelu' or 'relu'
    dropout:float=0.0, # dropout in between linear layers
):

A linear probing head (with optional MLP), assumes that the d_model corresponds to a particular segment of time and will make a prediction per patch per channel, and average the results

m = LinearProbingHead(c_in=7, 
                      input_size = 512, 
                      predict_every_n_patches=3600,
                      n_classes=2,
                      n_layers=2,
                      shared_embedding=False,
                      affine=True,
                      num_patch=3600,
                      dropout=0.1)

x = torch.randn((4,7,512,3600))

m(x, return_softmax=True).shape

torch.Size([4, 2, 1])

source

LinearProbingHeadBinaryOutcome


def LinearProbingHeadBinaryOutcome(
    c_in, # the number of input channels in the original input
    n_patches, # the number of stft or time patches
    patch_len, d_model, # the dimension of the transformer model
    norm:str='BatchNorm', # batchnorm or layernorm between linear and convolutional layers
    act:str='gelu', # activation function to use between layers, 'gelu' or 'relu'
    dropout:float=0.0, # dropout in between linear layers
):

A linear probing head (with optional MLP), assumes that the d_model corresponds to a particular segment of time and will make a prediction per patch per channel, and average the results

m = LinearProbingHeadBinaryOutcome(c_in=7, 
                      d_model = 512, 
                      n_patches=3600,
                      dropout=0.1)

x = torch.randn((4,7,512,3600))

m(x, return_softmax=False).shape

torch.Size([4, 2, 1])

source

MLPProbingHeadBinaryOutcome


def MLPProbingHeadBinaryOutcome(
    c_in, n_patches, d_model, hidden_dim:int=1024, dropout:float=0.1
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

source

MLPProbingHead


def MLPProbingHead(
    c_in, n_patches, d_model, n_classes, hidden_dim:int=1024, dropout:float=0.1, pre_norm:bool=False
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

m = MLPProbingHead(c_in=1, 
                      d_model=512, 
                      n_patches=10,
                      n_classes=3,
                      hidden_dim=1024,
                      dropout=0.1)

x = torch.randn((4,1,512,10))

m(x, return_softmax=False).shape

torch.Size([4, 3, 1])

source

LogisticRegression


def LogisticRegression(
    c_in, # the number of input channels in the original input
    n_classes, # the number of classes to predict (for sleep stage - there are 6)
    input_size, # the dimension of the transformer model
    n_patches, dropout:float=0.0
):

A linear probing head (with optional MLP), assumes that the d_model corresponds to a particular segment of time and will make a prediction per patch per channel, and average the results

m = LogisticRegression(c_in=7, 
                      input_size = 512, 
                      n_patches=3600,
                      n_classes=2,
                      )

x = torch.randn((4,7,512,3600))

m(x, return_softmax=False).shape

torch.Size([4, 2])

source

ProgressiveLogisticRegression


def ProgressiveLogisticRegression(
    c_in:int=7, input_size:int=512, n_patches:int=4800, embedding_reduction_dim:int=256, patch_reduction_dim:int=32,
    dropout:float=0.0
):

Binary logistic regression with progressive dimension reduction

m = ProgressiveLogisticRegression(c_in=7, 
                      input_size = 512, 
                      n_patches=3600,
                      embedding_reduction_dim=256,
                      patch_reduction_dim=32,
                      )

x = torch.randn((4,7,512,3600))

m(x, return_softmax=False).shape

torch.Size([4, 2, 1])

source

AvgPatchLogisticRegression


def AvgPatchLogisticRegression(
    c_in:int=7, input_size:int=512, missing_channel_indices:NoneType=None, dropout:float=0.0
):

Binary logistic regression with progressive dimension reduction

m = AvgPatchLogisticRegression(c_in=7, 
                      input_size = 512, 
                      )

x = torch.randn((4,7,512,3600))
y = torch.randint(0, 2, (4,1))
out = m(x, return_softmax=False)
out.shape

torch.Size([4, 2, 1])

source

AvgPatchMultiLabel


def AvgPatchMultiLabel(
    n_labels:int=1, c_in:int=7, input_size:int=512, missing_channel_indices:NoneType=None, dropout:float=0.0,
    weibull:bool=False, pre_norm:bool=False
):

Multilabel logistic regression

batch_size = 2
n_vars = 7
max_len = 480
d_model = 384

# Create sequences of different lengths
seq_lens = torch.randint(50, max_len, (batch_size,))

# Create input tensors with different sequence lengths
x_list = [torch.randn(n_vars, d_model, length) for length in seq_lens]
x_list2 = [torch.randn(n_vars, d_model, length) for length in seq_lens]
x_nested = torch.nested.as_nested_tensor(x_list, layout=torch.jagged)

m = AvgPatchMultiLabel(c_in=n_vars, input_size=d_model, n_labels=100, dropout=0.1)

y = m(x_nested, return_softmax=True)
# from wavefm.loss import mse_loss
# print(y.shape)
# y = y.mean(dim=1)
# print(y.shape)
# mse_loss(y, torch.ones_like(y)).backward()

torch.Size([2, 100])
torch.Size([2])

source

AvgChannelLogisticRegression


def AvgChannelLogisticRegression(
    n_patches, input_size:int=512, dropout:float=0.0
):

Binary logistic regression with channel averaging

source

IndividualPatchLinear


def IndividualPatchLinear(
    c_in, n_classes, n_patches, mlp_ratio:int=4, input_size:int=512, dropout:float=0.0
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

m = IndividualPatchLinear(
                 c_in=7,
                 n_classes=5,
                 n_patches=10,
                 mlp_ratio=4,
                 input_size=512,
                 dropout=0.)

x = torch.randn((4,7,512,10))
m(x).shape

torch.Size([4, 5, 10])

d_model = 256
n_heads = 2
batch_size = 4
n_vars = 7
max_len = 100

m = IndividualPatchLinear(
                 c_in=7,
                 n_classes=5,
                 n_patches=10,
                 mlp_ratio=4,
                 input_size=256,
                 dropout=0.)

# Create sequences of different lengths
seq_lens = torch.randint(50, max_len, (batch_size,))

# Create input tensors with different sequence lengths
x_list = [torch.randn(n_vars, d_model, length) for length in seq_lens]
x_nested = torch.nested.as_nested_tensor(x_list, layout=torch.jagged)

print(x_nested.shape)
# Forward pass
output = m(x_nested)
output.shape#.transpose(1, 2).flatten(-2).shape
# from wavefm.loss import multilabel_cox_loss

#multilabel_cox_loss(output, torch.tensor([1,0,1,1]), torch.tensor([10,20,23,21]), n_labels=1).backward()

torch.Size([4, 7, 256, j17])

torch.Size([4, 5, 10])

source

AttentionBinaryClassifier


def AttentionBinaryClassifier(
    c_in, d_model, n_heads, n_classes, num_patch, attn_dropout:float=0.0, dropout:float=0.0
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

m = AttentionBinaryClassifier(c_in=7, 
                      d_model = 512, 
                      n_heads=8,
                      n_classes=2,
                      num_patch=3600
                      )

x = torch.randn((4,7,512,3600))

m(x, return_softmax=False).shape

torch.Size([4, 2, 1])

source

AttentiveClassifierNoMelt


def AttentiveClassifierNoMelt(
    embed_dim:int=768, num_heads:int=12, mlp_ratio:float=4.0, depth:int=1, norm_layer:type=LayerNorm,
    init_std:float=0.02, qkv_bias:bool=True, num_classes:int=1000, complete_block:bool=True, num_queries:int=1,
    affine:bool=False, c_in:int=7
):

Attentive Classifier

source

AttentiveClassifier


def AttentiveClassifier(
    embed_dim:int=768, num_heads:int=12, mlp_ratio:float=4.0, depth:int=1, norm_layer:type=LayerNorm,
    init_std:float=0.02, qkv_bias:bool=True, num_classes:int=1000, complete_block:bool=True, num_queries:int=1,
    affine:bool=False, c_in:int=7, per_channel:bool=False
):

Attentive Classifier

source

AttentivePooler


def AttentivePooler(
    num_queries:int=1, embed_dim:int=768, num_heads:int=12, mlp_ratio:float=4.0, depth:int=1,
    norm_layer:type=LayerNorm, init_std:float=0.02, qkv_bias:bool=True, complete_block:bool=True
):

Attentive Pooler

source

CrossAttentionBlock


def CrossAttentionBlock(
    dim, num_heads, mlp_ratio:float=4.0, qkv_bias:bool=False, act_layer:type=GELU, norm_layer:type=LayerNorm
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

source

CrossAttention


def CrossAttention(
    dim, num_heads:int=12, qkv_bias:bool=False
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

#  [bs x nvars x d_model x num_patch] 
x = torch.randn(4 * 7, 50, 256)
cls = AttentiveClassifier(embed_dim=256, c_in=7, num_queries=1, num_heads=2, mlp_ratio=4.0, depth=1, norm_layer=nn.LayerNorm, init_std=0.02, qkv_bias=True, num_classes=1, complete_block=False, per_channel=False)
cls(x).shape

torch.Size([28, 1, 256])

torch.Size([4, 1])

#  [bs x nvars x d_model x num_patch] 

cls_ = AttentiveClassifier(embed_dim=768, num_queries=1, num_heads=8, mlp_ratio=4.0, depth=1, norm_layer=nn.LayerNorm, init_std=0.02, qkv_bias=True, num_classes=5, complete_block=False)
sum(p.numel() for p in cls_.parameters())

#  [bs x nvars x d_model x num_patch] 
x = torch.randn(4, 7*10, 256)
cls_ = AttentiveClassifierNoMelt(embed_dim=256, num_queries=1, num_heads=2, mlp_ratio=4.0, depth=1, norm_layer=nn.LayerNorm, init_std=0.02, qkv_bias=True, num_classes=1, complete_block=True)
cls_(x).shape

torch.Size([4, 1])

from wavefm.nested import flatten_dim_to_batch
d_model = 256
n_heads = 2
batch_size = 4
n_vars = 7
max_len = 100

# Create sequences of different lengths
seq_lens = torch.randint(50, max_len, (batch_size,))

# Create input tensors with different sequence lengths
x_list = [torch.randn(length, n_vars, d_model) for length in seq_lens]
x_nested = torch.nested.as_nested_tensor(x_list, layout=torch.jagged)
x_nested = x_nested.transpose(1,2)  # [bs x nvars x d_model x num_patch]
x_nested = flatten_dim_to_batch(x_nested, 1)

print(x_nested.shape)
# Forward pass
#| notest
#  [bs x nvars x d_model x num_patch] 
cls_ = AttentiveClassifier(embed_dim=256, num_queries=1, num_heads=4, mlp_ratio=4.0, depth=1, norm_layer=nn.LayerNorm, init_std=0.02, qkv_bias=True, num_classes=1, complete_block=True, per_channel=False)
output = cls_(x_nested)
output.shape#.transpose(1, 2).flatten(-2).shape
# from wavefm.loss import multilabel_cox_loss

#multilabel_cox_loss(output, torch.tensor([1,0,1,1]), torch.tensor([10,20,23,21]), n_labels=1).backward()

torch.Size([28, j2, 256])

/opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/site-packages/torch/nested/__init__.py:109: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. We recommend specifying layout=torch.jagged when constructing a nested tensor, as this layout receives active development, has better operator coverage, and works with torch.compile. (Triggered internally at /pytorch/aten/src/ATen/NestedTensorImpl.cpp:178.)
  return torch._nested_tensor_from_tensor_list(ts, dtype, None, device, None)

torch.Size([4, 1])

source

RNNAttentiveClassifier


def RNNAttentiveClassifier(
    c_in, rnn_input_size, rnn_hidden_size, n_classes, num_heads:int=12, mlp_ratio:float=4.0, depth:int=1,
    norm_layer:type=LayerNorm, init_std:float=0.02, qkv_bias:bool=True, complete_block:bool=True,
    rnn_module:str='GRU', rnn_dropout:float=0.0, num_rnn_layers:int=1, bidirectional:bool=True, affine:bool=False,
    pre_norm:bool=False
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

x = torch.randn((4,7,256,960))
m = RNNAttentiveClassifier(c_in=7, rnn_input_size=256, rnn_hidden_size=256, n_classes=2, rnn_module='GRU', num_rnn_layers=1, rnn_dropout=0., num_heads=2, mlp_ratio=4.0, depth=1, norm_layer=nn.LayerNorm, init_std=0.02, qkv_bias=True, complete_block=True)
m(x).shape

#| notest
d_model = 256
n_heads = 2
batch_size = 4
n_vars = 7
max_len = 100

# Create sequences of different lengths
seq_lens = torch.randint(50, max_len, (batch_size,))

# Create input tensors with different sequence lengths
x_list = [torch.randn(n_vars, d_model, length) for length in seq_lens]
x_nested = torch.nested.as_nested_tensor(x_list, layout=torch.jagged)

print(x_nested.shape)
# Forward pass
output = m(x_nested)
output.shape#.transpose(1, 2).flatten(-2).shape
# from wavefm.loss import multilabel_cox_loss

#multilabel_cox_loss(output, torch.tensor([1,0,1,1]), torch.tensor([10,20,23,21]), n_labels=1).backward()

torch.Size([4, 960, 512])
torch.Size([4, 1, 512, 960])
torch.Size([4, 7, 256, j52])
torch.Size([4, 94, 512])
torch.Size([4, 1, 512, 94])

torch.Size([4, 2])

source

TimeDistributedFeedForward


def TimeDistributedFeedForward(
    c_in, # the number of input channels
    n_classes, # the number of classes to predict (for sleep stage - there are 6)
    n_patches, # the number of stft or time patches
    d_model, # the dimension of the transformer model
    pred_len_seconds, # the sequence multiclass prediction length in seconds
    n_linear_layers, # the number of linear layers to use in the prediciton head, with RELU activation and dropout
    conv_kernel_stride_size, # the 1d convolution kernel size and stride, in seconds. If you want every 30 second predicitons, put 30 here.
    dropout:float=0.0, # dropout in between linear layers
):

Feed forward head that uses a convolutional layer to reduce channel dimensionality Followed by a feedforward network to make

source

ConvBiGRU


def ConvBiGRU(
    c_in, d_model, hidden_sizes, kernel_sizes, n_layers, predict_every_n_patches, n_classes
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

source

ConvGRU1D


def ConvGRU1D(
    input_size, hidden_sizes, # if integer, the same hidden size is used for all cells.
    kernel_sizes, # if integer, the same kernel size is used for all cells.
    n_layers
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

source

ConvGRU1DCell


def ConvGRU1DCell(
    input_size, hidden_size, kernel_size
):

Generate a convolutional GRU cell

x = torch.randn((4,7,512,3600))

convgru = ConvBiGRU(c_in = 7, d_model=512, hidden_sizes=32, kernel_sizes=3, n_layers=1,  predict_every_n_patches=3600, n_classes=2)

out = convgru(x)
out.shape

source

LRU


def LRU(
    in_features, out_features, state_features, rmin:int=0, rmax:int=1, max_phase:float=6.283
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool