Transformers

A good use of time, no doubt.

Time Series Transformer Encoder

TSTEncoderLayer

 TSTEncoderLayer (d_model, n_heads, d_ff=256, store_attn=False,
                  norm='BatchNorm', relative_attn_type='vanilla',
                  use_flash_attn=False, num_patches=None, attn_dropout=0,
                  dropout=0.0, bias=True, activation='gelu',
                  res_attention=False, pre_norm=False)

*Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
	Type	Default	Details
d_model			dimension of patch embeddings
n_heads			number of attention heads per layer
d_ff	int	256	dimension of feedforward layer in each transformer layer
store_attn	bool	False	indicator of whether or not to store attention
norm	str	BatchNorm
relative_attn_type	str	vanilla	options include vaniall or eRPE
use_flash_attn	bool	False	indicator to use flash attention
num_patches	NoneType	None	num patches required for eRPE attn
attn_dropout	int	0
dropout	float	0.0
bias	bool	True
activation	str	gelu
res_attention	bool	False
pre_norm	bool	False

Patch Time Series and Frequency Transformer

source

PatchTFTSimple

 PatchTFTSimple (c_in:int, win_length, hop_length, max_seq_len,
                 time_domain=True, pos_encoding_type='learned',
                 relative_attn_type='vanilla', use_flash_attn=False,
                 use_revin=True, dim1reduce=False, affine=True,
                 mask_ratio=0.1, augmentations=['patch_mask',
                 'jitter_zero_mask', 'reverse_sequence',
                 'shuffle_channels'], n_layers:int=2, d_model=512,
                 n_heads=2, shared_embedding=False, d_ff:int=2048,
                 norm:str='BatchNorm', attn_dropout:float=0.0,
                 dropout:float=0.1, act:str='gelu',
                 res_attention:bool=True, pre_norm:bool=False,
                 store_attn:bool=False, pretrain_head=True,
                 pretrain_head_n_layers=1, pretrain_head_dropout=0.0)

*Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
	Type	Default	Details
c_in	int		the number of input channels
win_length			the length of the patch of time/interval or short time ft windown length (when time_domain=False)
hop_length			the length of the distance between each patch/fft
max_seq_len			maximum sequence len
time_domain	bool	True
pos_encoding_type	str	learned	options include learned or tAPE
relative_attn_type	str	vanilla	options include vanilla or eRPE
use_flash_attn	bool	False	indicator to use flash attention
use_revin	bool	True	if time_domain is true, whether or not to instance normalize time data
dim1reduce	bool	False	indicator to normalize by timepoint in revin
affine	bool	True	if time_domain is true, whether or not to learn revin normalization parameters
mask_ratio	float	0.1	amount of signal to mask
augmentations	list	[‘patch_mask’, ‘jitter_zero_mask’, ‘reverse_sequence’, ‘shuffle_channels’]	the type of mask to use, options are patch or jitter_zero
n_layers	int	2	the number of transformer encoder layers to use
d_model	int	512	the dimension of the input to the transofmrer encoder
n_heads	int	2	the number of heads in each layer
shared_embedding	bool	False	indicator for whether or not each channel should be projected with its own set of linear weights to the encoder dimension
d_ff	int	2048	the feedforward layer size in the transformer
norm	str	BatchNorm	BatchNorm or LayerNorm during trianing
attn_dropout	float	0.0	dropout in attention
dropout	float	0.1	dropout for linear layers
act	str	gelu	activation function
res_attention	bool	True	whether to use residual attention
pre_norm	bool	False	indicator to pre batch or layer norm
store_attn	bool	False	indicator to store attention
pretrain_head	bool	True	indicator to include a pretraining head
pretrain_head_n_layers	int	1	how many linear layers on the pretrained head
pretrain_head_dropout	float	0.0	dropout applied to pretrain head

XX = torch.randn(4,7,1*3600*100)
pad = torch.zeros(4,1*3600*100)
pad[:,0:100] = 1
model = PatchTFTSimple(c_in=7,
                        win_length=750,
                        hop_length=750,
                        max_seq_len=(1*3600*100),
                        use_revin=True,
                        time_domain=True,
                        affine=False,
                        dim1reduce=False,
                        act='gelu',
                        use_flash_attn=True,
                        relative_attn_type='vanilla',
                        pos_encoding_type='learned',
                        mask_ratio=0.1,
                        augmentations=['jitter_zero_mask'],
                        n_layers=1,
                        n_heads=1,
                        d_model=512,
                        d_ff=2048,
                        dropout=0.,
                        attn_dropout=0.,
                        pre_norm=False,
                        res_attention=False,
                        shared_embedding=False,
                        pretrain_head=True
                        )
r = model(XX, sequence_padding_mask=pad)
r[0].shape, r[1].shape, r[3].shape

torch.Size([4, 480])

(torch.Size([4, 480, 7, 750]),
 torch.Size([4, 480, 7, 750]),
 torch.Size([4, 480]))