Transformers

A good use of time, no doubt.

Time Series Transformer Encoder


source

TSTEncoderLayer

 TSTEncoderLayer (d_model, n_heads, d_ff=256, store_attn=False,
                  norm='BatchNorm', relative_attn_type='vanilla',
                  use_flash_attn=False, num_patches=None, attn_dropout=0,
                  dropout=0.0, bias=True, activation='gelu',
                  res_attention=False, pre_norm=False)

*Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
Type Default Details
d_model dimension of patch embeddings
n_heads number of attention heads per layer
d_ff int 256 dimension of feedforward layer in each transformer layer
store_attn bool False indicator of whether or not to store attention
norm str BatchNorm
relative_attn_type str vanilla options include vaniall or eRPE
use_flash_attn bool False indicator to use flash attention
num_patches NoneType None num patches required for eRPE attn
attn_dropout int 0
dropout float 0.0
bias bool True
activation str gelu
res_attention bool False
pre_norm bool False

Patch Time Series and Frequency Transformer


source

PatchTFTSimple

 PatchTFTSimple (c_in:int, win_length, hop_length, max_seq_len,
                 time_domain=True, pos_encoding_type='learned',
                 relative_attn_type='vanilla', use_flash_attn=False,
                 use_revin=True, dim1reduce=False, affine=True,
                 mask_ratio=0.1, augmentations=['patch_mask',
                 'jitter_zero_mask', 'reverse_sequence',
                 'shuffle_channels'], n_layers:int=2, d_model=512,
                 n_heads=2, shared_embedding=False, d_ff:int=2048,
                 norm:str='BatchNorm', attn_dropout:float=0.0,
                 dropout:float=0.1, act:str='gelu',
                 res_attention:bool=True, pre_norm:bool=False,
                 store_attn:bool=False, pretrain_head=True,
                 pretrain_head_n_layers=1, pretrain_head_dropout=0.0)

*Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
Type Default Details
c_in int the number of input channels
win_length the length of the patch of time/interval or short time ft windown length (when time_domain=False)
hop_length the length of the distance between each patch/fft
max_seq_len maximum sequence len
time_domain bool True
pos_encoding_type str learned options include learned or tAPE
relative_attn_type str vanilla options include vanilla or eRPE
use_flash_attn bool False indicator to use flash attention
use_revin bool True if time_domain is true, whether or not to instance normalize time data
dim1reduce bool False indicator to normalize by timepoint in revin
affine bool True if time_domain is true, whether or not to learn revin normalization parameters
mask_ratio float 0.1 amount of signal to mask
augmentations list [‘patch_mask’, ‘jitter_zero_mask’, ‘reverse_sequence’, ‘shuffle_channels’] the type of mask to use, options are patch or jitter_zero
n_layers int 2 the number of transformer encoder layers to use
d_model int 512 the dimension of the input to the transofmrer encoder
n_heads int 2 the number of heads in each layer
shared_embedding bool False indicator for whether or not each channel should be projected with its own set of linear weights to the encoder dimension
d_ff int 2048 the feedforward layer size in the transformer
norm str BatchNorm BatchNorm or LayerNorm during trianing
attn_dropout float 0.0 dropout in attention
dropout float 0.1 dropout for linear layers
act str gelu activation function
res_attention bool True whether to use residual attention
pre_norm bool False indicator to pre batch or layer norm
store_attn bool False indicator to store attention
pretrain_head bool True indicator to include a pretraining head
pretrain_head_n_layers int 1 how many linear layers on the pretrained head
pretrain_head_dropout float 0.0 dropout applied to pretrain head
XX = torch.randn(4,7,1*3600*100)
pad = torch.zeros(4,1*3600*100)
pad[:,0:100] = 1
model = PatchTFTSimple(c_in=7,
                        win_length=750,
                        hop_length=750,
                        max_seq_len=(1*3600*100),
                        use_revin=True,
                        time_domain=True,
                        affine=False,
                        dim1reduce=False,
                        act='gelu',
                        use_flash_attn=True,
                        relative_attn_type='vanilla',
                        pos_encoding_type='learned',
                        mask_ratio=0.1,
                        augmentations=['jitter_zero_mask'],
                        n_layers=1,
                        n_heads=1,
                        d_model=512,
                        d_ff=2048,
                        dropout=0.,
                        attn_dropout=0.,
                        pre_norm=False,
                        res_attention=False,
                        shared_embedding=False,
                        pretrain_head=True
                        )
r = model(XX, sequence_padding_mask=pad)
r[0].shape, r[1].shape, r[3].shape
torch.Size([4, 480])
(torch.Size([4, 480, 7, 750]),
 torch.Size([4, 480, 7, 750]),
 torch.Size([4, 480]))