Transformers

A good use of time, no doubt.

Time Series Transformer Encoder


source

TSTEncoderLayer


def TSTEncoderLayer(
    d_model, # dimension of patch embeddings
    n_heads, # number of attention heads per layer
    d_ff:int=256, # dimension of feedforward layer in each transformer layer
    store_attn:bool=False, # indicator of whether or not to store attention
    norm:str='BatchNorm', relative_attn_type:str='vanilla', # options include vaniall or eRPE
    use_flash_attn:bool=False, # indicator to use flash attention
    num_patches:NoneType=None, # num patches required for eRPE attn
    attn_dropout:int=0, dropout:float=0.0, bias:bool=True, activation:str='gelu', res_attention:bool=False,
    pre_norm:bool=False
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

Patch Time Series and Frequency Transformer


source

PatchTFTSimple


def PatchTFTSimple(
    c_in:int, # the number of input channels
    win_length, # the length of the patch of time/interval or short time ft windown length (when time_domain=False)
    hop_length, # the length of the distance between each patch/fft
    max_seq_len, # maximum sequence len
    time_domain:bool=True, pos_encoding_type:str='learned', # options include learned or tAPE
    relative_attn_type:str='vanilla', # options include vanilla or eRPE
    use_flash_attn:bool=False, # indicator to use flash attention
    use_revin:bool=True, # if time_domain is true, whether or not to instance normalize time data
    dim1reduce:bool=False, # indicator to normalize by timepoint in revin
    affine:bool=True, # if time_domain is true, whether or not to learn revin normalization parameters
    mask_ratio:float=0.1, # amount of signal to mask
    augmentations:list=['patch_mask', 'jitter_zero_mask', 'reverse_sequence', 'shuffle_channels'], # the type of mask to use, options are patch or jitter_zero
    n_layers:int=2, # the number of transformer encoder layers to use
    d_model:int=512, # the dimension of the input to the transofmrer encoder
    n_heads:int=2, # the number of heads in each layer
    shared_embedding:bool=False, # indicator for whether or not each channel should be projected with its own set of linear weights to the encoder dimension
    d_ff:int=2048, # the feedforward layer size in the transformer
    norm:str='BatchNorm', # BatchNorm or LayerNorm during trianing
    attn_dropout:float=0.0, # dropout in attention
    dropout:float=0.1, # dropout for linear layers
    act:str='gelu', # activation function
    res_attention:bool=True, # whether to use residual attention
    pre_norm:bool=False, # indicator to pre batch or layer norm
    store_attn:bool=False, # indicator to store attention
    pretrain_head:bool=True, # indicator to include a pretraining head
    pretrain_head_n_layers:int=1, # how many linear layers on the pretrained head
    pretrain_head_dropout:float=0.0, # dropout applied to pretrain head
):

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

XX = torch.randn(4,7,1*3600*100)
pad = torch.zeros(4,1*3600*100)
pad[:,0:100] = 1
model = PatchTFTSimple(c_in=7,
                        win_length=750,
                        hop_length=750,
                        max_seq_len=(1*3600*100),
                        use_revin=True,
                        time_domain=True,
                        affine=False,
                        dim1reduce=False,
                        act='gelu',
                        use_flash_attn=True,
                        relative_attn_type='vanilla',
                        pos_encoding_type='learned',
                        mask_ratio=0.1,
                        augmentations=['jitter_zero_mask'],
                        n_layers=1,
                        n_heads=1,
                        d_model=512,
                        d_ff=2048,
                        dropout=0.,
                        attn_dropout=0.,
                        pre_norm=False,
                        res_attention=False,
                        shared_embedding=False,
                        pretrain_head=True
                        )
r = model(XX, sequence_padding_mask=pad)
r[0].shape, r[1].shape, r[3].shape
torch.Size([4, 480])
(torch.Size([4, 480, 7, 750]),
 torch.Size([4, 480, 7, 750]),
 torch.Size([4, 480]))