Layers

Potentially helpful layers for your models

Miscellaneous

SeriesDecomposition


def SeriesDecomposition(
    kernel_size:int, # the size of the window
):

Series decomposition block

source

MovingAverage


def MovingAverage(
    kernel_size:int, # the size of the window
):

Moving average block to highlight the trend of time series

source

get_activation_fn


def get_activation_fn(
    activation
):

Call self as a function.

source

Transpose


def Transpose(
    dims:VAR_POSITIONAL, contiguous:bool=False
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

source

Identity


def Identity(
    
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

Linear Layers for Patches

source

PatchEncoder3D


def PatchEncoder3D(
    c_in, patch_size, # patch len per frame
    n_patches, # number of patches per frame
    tubelet_size, # how many frames to process at a time
    d_model
):

3d convolution for patched time series data, broken into frames

m = PatchEncoder3D(c_in=7, patch_size = 3, n_patches = 12, tubelet_size=2, d_model = 512)

x = torch.randn(4, 7, 10, 12, 3)
m(x).shape

torch.Size([4, 5, 512])

Positional Encoding Layers

source

PositionalEncoding


def PositionalEncoding(
    num_patch, # number of patches of time series or stft in input
    d_model, # dimension of patch embeddings
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

d_model = 512
n_heads = 8
batch_size = 2
n_vars = 7
max_len = 100

# Create sequences of different lengths
seq_lens = torch.randint(50, max_len, (batch_size,))

# Create input tensors with different sequence lengths
x_list = [torch.randn(length, d_model) for length in seq_lens]
x_nested = torch.nested.as_nested_tensor(x_list, layout=torch.jagged)

p = PositionalEncoding(num_patch=max_len, d_model=d_model)
out = p(x_nested)

source

tAPE


def tAPE(
    d_model:int, # the embedding dimension
    seq_len:int, # the max. length of the incoming sequence or num patches
):

time Absolute Position Encoding Adapted from tsai

d_model = 768
batch_size = 2
n_vars = 7
max_len = 14400

# Create sequences of different lengths
seq_lens = torch.randint(10000, max_len, (batch_size,))

# Create input tensors with different sequence lengths
x_list = [torch.randn(length, d_model) for length in seq_lens]
x_nested = torch.nested.as_nested_tensor(x_list, layout=torch.jagged)

p = tAPE(seq_len=max_len, d_model=d_model)
out = p(x_nested)
out.shape

Any nans: tensor(False)
Any nans in x: tensor(False)
Any nans: tensor(False)
Any nans in x: tensor(False)

torch.Size([2, j15, 768])

Mask and Augmentation Layers

source

Mask


def Mask(
    mask_type, mask_ratio, return_mask:bool=True
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

torch.manual_seed(125)
m = Mask(mask_type='jitter_zero', mask_ratio=0.5)
x = torch.randn((9))

m(x), m(x)

import torch
x = [0,1,2,3]
[x[i] for i in torch.randperm(len(x))]

[0, 1, 3, 2]

source

PatchAugmentations


def PatchAugmentations(
    augmentations:list=['patch_mask', 'jitter_zero_mask', 'reverse_sequence', 'shuffle_channels', 'channel_masking'],
    patch_mask_ratio:float=0.0, jitter_zero_mask_ratio:float=0.0
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

x = torch.randn(4,3600,7,750)

s=PatchAugmentations(augmentations=[], patch_mask_ratio=0.1, jitter_zero_mask_ratio=0.1)
s(x).shape
torch.equal(x, s(x))

True

source

EmbeddingAugmentations


def EmbeddingAugmentations(
    augmentations:list=['shuffle_dims', 'jitter_zero_mask', 'patch_mask'], dims_to_shuffle:list=[1, 2, 3],
    patch_mask_ratio:float=0.0, jitter_zero_mask_ratio:float=0.0
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

x = torch.randn(4,7,512,3600)

s = EmbeddingAugmentations(augmentations=['jitter_zero_mask'], dims_to_shuffle=[1], patch_mask_ratio=0.1, jitter_zero_mask_ratio=0.1)
s(x).shape

torch.Size([4, 7, 512, 3600])

Patch and Fourier Layers

source

Patch3d


def Patch3d(
    frame_len, frame_stride, patch_len, patch_stride
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

source

Patch


def Patch(
    patch_len, stride
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

source

STFT


def STFT(
    n_fft, win_length, hop_length, stft_norm, decibel_scale, channel_stft_means:NoneType=None,
    channel_stft_stds:NoneType=None, pad_win_length_to_nfft:bool=True, pad_mode:str='reflect', center:bool=False,
    return_complex:bool=True
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

source

FFT


def FFT(
    dim:int=-1, # dimension to calculate fft over
    norm:str='backward', # "forward" - normalize by 1/n, "backward" - no normalization, "ortho" - normalize by 1/sqrt(n) (making the FFT orthonormal)
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

patch_len = 750
d_model = 12
n_vars = 7
max_len = 100
bs = 3
# Create sequences of different lengths
seq_lens = torch.randint(50, max_len, (bs,))

# Create input tensors with different sequence lengths
x_list = [torch.randn(length, n_vars, patch_len) for length in seq_lens]
x_nested = torch.nested.as_nested_tensor(x_list, layout=torch.jagged)

f = FFT(dim=-1)
x = f(x_nested)
x.size(0), x.dim()

(3, 4)

Reversible Instance Normalization

source

RevIN


def RevIN(
    num_features:int, # the number of channels or features in the input
    eps:float=1e-05, # added to avoid division by zero errors
    dim_to_reduce:int=-1, # the dimension to reduce,
    affine:bool=True, # learning affine parameters bias and weight per channel
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

x = torch.randn(2,7,100)
revin = RevIN(7, dim_to_reduce=-1, affine=True)

x_norm = revin(x, mode=True)
x_denorm = revin(x_norm, mode=False)
x_norm.shape, x_denorm.shape

(torch.Size([2, 7, 100]), torch.Size([2, 7, 100]))

batch_size = 2
n_vars = 7
max_len = 100

# Create sequences of different lengths
seq_lens = torch.randint(50, max_len, (batch_size,))

# Create input tensors with different sequence lengths
x_list = [torch.randn(n_vars, length) for length in seq_lens]
x_nested = torch.nested.as_nested_tensor(x_list, layout=torch.jagged)
print(x_nested.shape)

revin = RevIN(n_vars, dim_to_reduce=-1, affine=True)
x_norm = revin(x_nested, mode=True)
print(x_norm.shape)
x_denorm = revin(x_norm, mode=False)
print(x_denorm.shape)

torch.Size([2, 7, j26])
torch.Size([2, 7, j27])
torch.Size([2, 7, j28])

Masked Channel Tokens

source

LearnableMaskedChannelTokens


def LearnableMaskedChannelTokens(
    missing_channel_indices, d_model
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

x = torch.randn(2,7,512,3600)
missing_channel_indices = [0,1]
learnable_mask_tokens = LearnableMaskedChannelTokens(missing_channel_indices, d_model=512)
learnable_mask_tokens(x).shape

seq_lens = torch.tensor([50, 100])

# Create input tensors with different sequence lengths
x_list = [torch.randn(n_vars,512, length) for length in seq_lens]
x_nested = torch.nested.as_nested_tensor(x_list, layout=torch.jagged)

out = learnable_mask_tokens(x_nested)
out.shape

torch.Size([2, 7, 512, j99])

Inception

source

InceptionBlock


def InceptionBlock(
    in_channels, bottleneck_channels:int=32, residual:bool=True, depth:int=6, groups:int=1, kwargs:VAR_KEYWORD
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

source

InceptionModule


def InceptionModule(
    in_channels:int, bottleneck_channels:int=32, bottleneck:bool=True, kernel_size:int=40, groups:int=1
):

Inception module adapted from https://github.com/timeseriesAI/tsai/blob/main/tsai/models/InceptionTime.py

Attention

source

MultiHeadAttention


def MultiHeadAttention(
    dim, num_heads:int=8, qkv_bias:bool=False, qk_scale:NoneType=None, attn_drop:float=0.0, proj_drop:float=0.0,
    rotary_pes:bool=False, max_n_patches_rotary:int=14500
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

source

MLP


def MLP(
    in_features, hidden_features:NoneType=None, out_features:NoneType=None, act_layer:type=GELU, drop:float=0.0
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

source

ScaledDotProductAttention


def ScaledDotProductAttention(
    d_model, n_heads, attn_dropout:float=0.0, res_attention:bool=False, lsa:bool=False
):

Scaled Dot-Product Attention module (Attention is all you need by Vaswani et al., 2017) with optional residual attention from previous layer (Realformer: Transformer likes residual attention by He et al, 2020) and locality self sttention (Vision Transformer for Small-Size Datasets by Lee et al, 2021)

source

MultiheadAttentionCustom


def MultiheadAttentionCustom(
    d_model, n_heads, d_k:NoneType=None, d_v:NoneType=None, res_attention:bool=False, attn_dropout:float=0.0,
    proj_dropout:float=0.0, qkv_bias:bool=True, lsa:bool=False
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

mha_attn = MultiheadAttentionCustom(d_model=512, n_heads=8, attn_dropout=0., proj_dropout=0., res_attention=False)
mha_attn

MultiheadAttentionCustom(
  (W_Q): Linear(in_features=512, out_features=512, bias=True)
  (W_K): Linear(in_features=512, out_features=512, bias=True)
  (W_V): Linear(in_features=512, out_features=512, bias=True)
  (sdp_attn): ScaledDotProductAttention(
    (attn_dropout): Dropout(p=0.0, inplace=False)
  )
  (to_out): Sequential(
    (0): Linear(in_features=512, out_features=512, bias=True)
    (1): Dropout(p=0.0, inplace=False)
  )
)

def test_attention_equivalence():
    # Set random seed for reproducibility
    torch.manual_seed(42)
    
    # Test parameters
    batch_size = 2
    seq_len = 10
    d_model = 64
    n_heads = 4
    
    # Create input tensor (only need one since we're using self-attention)
    x = torch.randn(batch_size, seq_len, d_model)
    
    # Create key padding mask
    key_padding_mask = torch.zeros(batch_size, seq_len, dtype=torch.bool)
    key_padding_mask[:, -2:] = True  # mask last 2 positions

    # Initialize both implementations
    custom_mha = MultiheadAttentionCustom(d_model=d_model, n_heads=n_heads)
    flash_mha = MultiHeadAttention(d_model=d_model, n_heads=n_heads)
    
    # Set both models to eval mode to disable dropout
    custom_mha.eval()
    flash_mha.eval()
    
    # Copy weights to ensure identical parameters
    # Combine QKV weights from custom implementation into single matrix for flash attention
    combined_weight = torch.cat([
        custom_mha.W_Q.weight,
        custom_mha.W_K.weight,
        custom_mha.W_V.weight
    ], dim=0)
    combined_bias = torch.cat([
        custom_mha.W_Q.bias,
        custom_mha.W_K.bias,
        custom_mha.W_V.bias
    ], dim=0)
    
    # Copy combined weights to flash attention
    flash_mha.c_attn.weight.data = combined_weight
    flash_mha.c_attn.bias.data = combined_bias
    
    # Output projection weights
    flash_mha.c_proj.weight.data = custom_mha.to_out[0].weight.data.clone()
    flash_mha.c_proj.bias.data = custom_mha.to_out[0].bias.data.clone()
    
    # Forward pass
    with torch.no_grad():
        custom_output, custom_attn = custom_mha(x, key_padding_mask=key_padding_mask)
        
        flash_output = flash_mha(x, attn_mask=key_padding_mask)
    
    # Compare outputs
    print(f"Custom output shape: {custom_output.shape}")
    print(f"Flash output shape: {flash_output.shape}")
    
    output_close = torch.allclose(custom_output, flash_output, rtol=0, atol=0)
    print(f"Outputs match: {output_close}")
    
    if not output_close:
        print("\nOutput differences:")
        print(f"Max difference: {(custom_output - flash_output).abs().max().item()}")
        print(f"Mean difference: {(custom_output - flash_output).abs().mean().item()}")
    
    return custom_output, flash_output

custom_output, flash_output = test_attention_equivalence()
#: 8.940696716308594e-08
#Mean difference: 1.0550138540565968e-08

Custom output shape: torch.Size([2, 10, 64])
Flash output shape: torch.Size([2, 10, 64])
Outputs match: True

d_model=512
n_heads=8
d_k = d_v = d_model // n_heads
attn = ScaledDotProductAttention(d_model=d_model, n_heads=n_heads)
mha_attn = MultiheadAttentionCustom(d_model, n_heads)

W_Q = nn.Linear(d_model, d_k * n_heads)
W_K = nn.Linear(d_model, d_k * n_heads)
W_V = nn.Linear(d_model, d_v * n_heads)
X,_,_ = ds[0]

X = create_patch(X, patch_len=(10*50), stride=(5*50), constant_pad=True)

patch_len = X.shape[-1]

X = X[None, ...].permute(0,2,1,3)  # simulate batch size of 1 [bs x n_vars x num_patch x patch_len]

print(f'X input shape: {X.shape}')
W_P = nn.Linear(patch_len, d_model)

X = W_P(X) # project to d_model
print(f"Projected X shape to d_model: {X.shape}")

X = torch.reshape(X, (X.shape[0]*X.shape[1],X.shape[2],X.shape[3]))
print(f"Reshape for attention: {X.shape}")

# test multihead attention
print("\nTesting MHA and SDA attention, with just 50 elements.")
mha_output, mha_attn_weights = mha_attn(Q=X[:,:50,:])
print(f"MHA attention output shape: {mha_output.shape}, mha attn weight shape: {mha_attn_weights.shape}")

# test scaled dot product attn
K = Q = V = X

# # Linear (+ split in multiple heads)
bs = 1 # 1 * 16
q_s = W_Q(Q).reshape(bs, -1, n_heads, d_k).transpose(1, 2)
k_s = W_K(K).reshape(bs, -1, n_heads, d_k).permute(0, 2, 3, 1)
v_s = W_V(V).reshape(bs, -1, n_heads, d_v).transpose(1, 2)
print(f"Q shape: {q_s.shape}, K shape: {k_s.shape}, V shape: {v_s.shape}")

to_out = nn.Linear(n_heads * d_v, d_model)
output, attn_weights = attn(q_s[:,:,:50,:],k_s[:,:,:,:50], v_s[:,:,:50,:])
output = output.transpose(1, 2).contiguous().view(bs, -1, n_heads * d_v)
print(f"Attn output shape {output.shape}, attn weight shape: {attn_weights.shape}")

X input shape: torch.Size([1, 7, 10799, 500])
Projected X shape to d_model: torch.Size([1, 7, 10799, 512])
Reshape for attention: torch.Size([7, 10799, 512])

Testing MHA and SDA attention, with just 50 elements.
MHA attention output shape: torch.Size([7, 50, 512]), mha attn weight shape: torch.Size([7, 8, 50, 50])
Q shape: torch.Size([1, 8, 75593, 64]), K shape: torch.Size([1, 8, 64, 75593]), V shape: torch.Size([1, 8, 75593, 64])
Attn output shape torch.Size([1, 50, 512]), attn weight shape: torch.Size([1, 8, 50, 50])

source

Attention_Rel_Scl


def Attention_Rel_Scl(
    d_model:int, # Embedding dimension
    n_heads:int, # number of attention heads
    seq_len:int, # sequence length or num patches
    d_k:int=None, # key dimension
    d_v:int=None, # value dimension
    res_attention:bool=False, # whether to use residual attention
    attn_dropout:float=0.0, # dropout for attention
    lsa:bool=False, # whether to use LSA, trainable paramater for scaling
    proj_dropout:float=0.0, # dropout for projection
    qkv_bias:bool=True, # bias for q, k, v
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

## test w patches [bs *c_in x num_patches x d_model]
d_model=512
c_in = 2
num_patches = 10
x_emb = torch.randn(4*c_in,num_patches, d_model)
abs_position = tAPE(d_model, seq_len=num_patches)
x_emb_pos = abs_position(x_emb)

model = Attention_Rel_Scl(d_model=d_model,
        n_heads=2, # number of attention heads
        seq_len=num_patches, # sequence length or num patches
        )

out, attn_weights = model(x_emb)