def DecoderFeedForward( c_in, # the number of input channels predict_every_n_patches, # for a given sequence of length m with frequency f, number of predictions num_layers, d_ff, attn_dropout, res_attention, pre_norm, store_attn, n_heads, shared_embedding, affine, n_classes, # the number of classes to predict (for sleep stage - there are 6) d_model, # the dimension of the transformer model norm:str='BatchNorm', # batchnorm or layernorm between linear and convolutional layers act:str='gelu', # activation function to use between layers, 'gelu' or 'relu' dropout:float=0.0, # dropout in between linear layers):
transformer decoder with attention for feedforward predictions. This is really just another encoder layer followed by a linear layer + 1d convolution + softmax. However, if used in linear probing, could be useful.
def ConvolutionalClassifier( c_in, # the number of input channels frequency, # the frequency of the original channels predict_every_seconds, # for a given sequence of length m with frequency f, number of predictions n_classes, # the number of classes to predict (for sleep stage - there are 6) win_length, # the convolved patch length, the first step in this is to do a linear layer to this dimension d_model, # the dimension of the transformer model affine:bool=False, shared_embedding:bool=True):
def TimeDistributedConvolutionalFeedForward( c_in, # the number of input channels kernel_size, # for a given sequence of length m with frequency f, number of predictions n_classes, # the number of classes to predict (for sleep stage - there are 6) d_model, # the dimension of the transformer model affine:bool=False, dropout:float=0.0, shared_embedding:bool=True):
Convolutional feed forward head that first uses a linear feed forward network to project features into the original convolutional dimension. Then, a convolutional transpose is used to extrapolate the data to its original form. Finally, a final convolution is used to predict the classes.
def LinearProbingHead( c_in, # the number of input channels in the original input predict_every_n_patches, # for a given sequence of length m with frequency f, number of predictions n_classes, # the number of classes to predict (for sleep stage - there are 6) input_size, # the dimension of the transformer model n_layers, # the number of linear layers to use in the prediciton head, with RELU activation and dropout num_patch, shared_embedding:bool=True, # whether or not to have a dense layer per channel or one layer per channel affine:bool=True, # include learnable parameters to weight predictions norm:str='BatchNorm', # batchnorm or layernorm between linear and convolutional layers act:str='gelu', # activation function to use between layers, 'gelu' or 'relu' dropout:float=0.0, # dropout in between linear layers):
A linear probing head (with optional MLP), assumes that the d_model corresponds to a particular segment of time and will make a prediction per patch per channel, and average the results
def LinearProbingHeadBinaryOutcome( c_in, # the number of input channels in the original input n_patches, # the number of stft or time patches patch_len, d_model, # the dimension of the transformer model norm:str='BatchNorm', # batchnorm or layernorm between linear and convolutional layers act:str='gelu', # activation function to use between layers, 'gelu' or 'relu' dropout:float=0.0, # dropout in between linear layers):
A linear probing head (with optional MLP), assumes that the d_model corresponds to a particular segment of time and will make a prediction per patch per channel, and average the results
m = LinearProbingHeadBinaryOutcome(c_in=7, d_model =512, n_patches=3600, dropout=0.1)x = torch.randn((4,7,512,3600))m(x, return_softmax=False).shape
def LogisticRegression( c_in, # the number of input channels in the original input n_classes, # the number of classes to predict (for sleep stage - there are 6) input_size, # the dimension of the transformer model n_patches, dropout:float=0.0):
A linear probing head (with optional MLP), assumes that the d_model corresponds to a particular segment of time and will make a prediction per patch per channel, and average the results
Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self) -> None:
super().__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.
.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.
:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool
# [bs x nvars x d_model x num_patch] x = torch.randn(4*7, 50, 256)cls = AttentiveClassifier(embed_dim=256, c_in=7, num_queries=1, num_heads=2, mlp_ratio=4.0, depth=1, norm_layer=nn.LayerNorm, init_std=0.02, qkv_bias=True, num_classes=1, complete_block=False, per_channel=False)cls(x).shape
torch.Size([28, 1, 256])
torch.Size([4, 1])
# [bs x nvars x d_model x num_patch] cls_ = AttentiveClassifier(embed_dim=768, num_queries=1, num_heads=8, mlp_ratio=4.0, depth=1, norm_layer=nn.LayerNorm, init_std=0.02, qkv_bias=True, num_classes=5, complete_block=False)sum(p.numel() for p in cls_.parameters())
2390021
# [bs x nvars x d_model x num_patch] x = torch.randn(4, 7*10, 256)cls_ = AttentiveClassifierNoMelt(embed_dim=256, num_queries=1, num_heads=2, mlp_ratio=4.0, depth=1, norm_layer=nn.LayerNorm, init_std=0.02, qkv_bias=True, num_classes=1, complete_block=True)cls_(x).shape
torch.Size([4, 1])
from wavefm.nested import flatten_dim_to_batchd_model =256n_heads =2batch_size =4n_vars =7max_len =100# Create sequences of different lengthsseq_lens = torch.randint(50, max_len, (batch_size,))# Create input tensors with different sequence lengthsx_list = [torch.randn(length, n_vars, d_model) for length in seq_lens]x_nested = torch.nested.as_nested_tensor(x_list, layout=torch.jagged)x_nested = x_nested.transpose(1,2) # [bs x nvars x d_model x num_patch]x_nested = flatten_dim_to_batch(x_nested, 1)print(x_nested.shape)# Forward pass#| notest# [bs x nvars x d_model x num_patch] cls_ = AttentiveClassifier(embed_dim=256, num_queries=1, num_heads=4, mlp_ratio=4.0, depth=1, norm_layer=nn.LayerNorm, init_std=0.02, qkv_bias=True, num_classes=1, complete_block=True, per_channel=False)output = cls_(x_nested)output.shape#.transpose(1, 2).flatten(-2).shape# from wavefm.loss import multilabel_cox_loss#multilabel_cox_loss(output, torch.tensor([1,0,1,1]), torch.tensor([10,20,23,21]), n_labels=1).backward()
torch.Size([28, j2, 256])
/opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/site-packages/torch/nested/__init__.py:109: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. We recommend specifying layout=torch.jagged when constructing a nested tensor, as this layout receives active development, has better operator coverage, and works with torch.compile. (Triggered internally at /pytorch/aten/src/ATen/NestedTensorImpl.cpp:178.)
return torch._nested_tensor_from_tensor_list(ts, dtype, None, device, None)
def TimeDistributedFeedForward( c_in, # the number of input channels n_classes, # the number of classes to predict (for sleep stage - there are 6) n_patches, # the number of stft or time patches d_model, # the dimension of the transformer model pred_len_seconds, # the sequence multiclass prediction length in seconds n_linear_layers, # the number of linear layers to use in the prediciton head, with RELU activation and dropout conv_kernel_stride_size, # the 1d convolution kernel size and stride, in seconds. If you want every 30 second predicitons, put 30 here. dropout:float=0.0, # dropout in between linear layers):
Feed forward head that uses a convolutional layer to reduce channel dimensionality Followed by a feedforward network to make
def ConvGRU1D( input_size, hidden_sizes, # if integer, the same hidden size is used for all cells. kernel_sizes, # if integer, the same kernel size is used for all cells. n_layers):
Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self) -> None:
super().__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.
.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.
:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool