Slumber

Sleepy datasets and helpers

Constants

EDF Helpers

Read EDFs


source

read_edf


def read_edf(
    file_path, # file path of edf
    channels:NoneType=None, # channels in edf to read, will raise warning if channels do not exist
    frequency:NoneType=None, # frequency to resample all signals to
)->Union: # tuple of signals and header dictionary

Function to read an edf file and return a list of signals and header with the option to resample to a passed frequency


source

read_edf_mne


def read_edf_mne(
    file_path, # file path of edf
    channels:NoneType=None, # channels in edf to read, will raise warning if channels do not exist
    frequency:NoneType=None, # frequency to resample all signals to
)->Union: # tuple of signals and header dictionary

function to read edf with mne library i dont recommend using this. Use edfio instead.


source

read_edf_edfio


def read_edf_edfio(
    file_path, # file path of edf
    channels:NoneType=None, # channels in edf to read, will raise warning if channels do not exist
    frequency:NoneType=None, # frequency to resample all signals to
)->Union: # tuple of signals and header dictionary

function to read edfs with edfio

Read Hypnograms


source

read_hypnogram


def read_hypnogram(
    file, # file path of the hypnogram csv
    epoch_length:NoneType=None, # epoch length of the hypnogram measurements, if passed will repeat this many times at each element
)->array: # numpy array of hypnogram

Function that reads a hypnogram csv and returns a numpy array of the hypnogram with optional repeats

EDFs to Zarr


source

edf_signals_to_zarr


def edf_signals_to_zarr(
    edf_file_path, write_data_dir, overwrite:bool=False, channels:NoneType=None, channel_name_map:NoneType=None,
    frequency:NoneType=None, hyp_epoch_length:int=30, hyp_data_dir:NoneType=None
):

Function that converts an edf to a zarr file

Datasets

Self Supervised Dataset


source

remove_wake_epochs_from_signals


def remove_wake_epochs_from_signals(
    X, hypnogram, resampled_hypnogram_length, padding_mask:int=-100
):

Function to trim wake epochs (if wake is the largest class) from signals

X: bs, channels, seq_len hypnogram: bs, seq_len / resampled_hypnogram_length sequence_padding_mask: bs, seq_len


source

trim_wake_epochs_from_hypnogram


def trim_wake_epochs_from_hypnogram(
    hypnogram, padding_mask:int=-100
):

Function to trim wake epochs (if wake is the largest class) from hypnograms This function trims the wake epochs from the beginning and/or end of the hypnogram

Adapted from Phan et al L-SeqSleepNet

Sleep Stage Self Supervised Dataset


source

SelfSupervisedHypnogramTimeDataset


def SelfSupervisedHypnogramTimeDataset(
    zarr_files, # zarr files that include samples
    channels, # channels to use
    max_seq_len_sec, # maximum sequence length (in seconds) to use (this is especially relevant when you are returning both stft and raw ts data to keep them in sync)
    sample_seq_len_sec, # if no sample_df, generate sequences of this length in seconds as one sample
    sample_stride_sec, # if no sample_df, seconds of overlap for samples from the same array, if seq_len_seconds == overlap_seconds, there is no overlap
    frequency, # frequency of underlying data
    min_seq_len_sec:NoneType=None, # minimum sequence length (in seconds) to use
    start_offset_sec:int=0, # number of seconds to exclude from beginning of sleep studies
    trim_wake_epochs:bool=True, # indicator to trim wake epochs from hypnograms, if it is the largest class
    include_partial_samples:bool=True, # indicator to include data from partial samples when return_full_length is false
    sample_df:NoneType=None, # dataframe indicating which indices within each zarr file includes a sample
    return_hypnogram_every_sec:int=30, # integer value indicating the step in indexing in seconds
    hypnogram_padding_mask:int=-100, # padded value to add to target and indice to ignore when computing loss
    hypnogram_frequency:int=1, # frequency of underlying y hypnogram data
    butterworth_filters:dict={'ECG': [None, 0.3], 'ECG (LL-RA)': [None, 0.3], 'EKG': [None, 0.3], 'ECG (L-R)': [None, 0.3], 'ECG2': [None, 0.3], 'EOG(L)': [0.3, 45], 'EOG-L': [0.3, 45], 'E1': [0.3, 45], 'LOC': [0.3, 45], 'E1-M2': [0.3, 45], 'E1-AVG': [0.3, 45], 'EMG': [None, 10], 'cchin_l': [None, 10], 'chin': [None, 10], 'EMG (L-R)': [None, 10], 'Chin 1-Chin 2': [None, 10], 'EMG (1-2)': [None, 10], 'EMG (1-3)': [None, 10], 'Chin3': [None, 10], 'cchin': [None, 10], 'C4-M1': [0.3, 45], 'C4_M1': [0.3, 45], 'EEG': [0.3, 45], 'EEG3': [0.3, 45], 'C3-M2': [0.3, 45], 'C3_M2': [0.3, 45], 'EEG(sec)': [0.3, 45], 'C4-AVG': [0.3, 45], 'C3-AVG': [0.3, 45], 'THOR RES': [0.1, 15], 'Thor': [0.1, 15], 'thorax': [0.1, 15], 'Thoracic': [0.1, 15], 'Chest': [0.1, 15], 'ABDO RES': [0.1, 15], 'abdomen': [0.1, 15], 'Abdo': [0.1, 15], 'Abdominal': [0.1, 15], 'ABD': [0.1, 15], 'Abd': [0.1, 15]}, # dictionary of low pass, high pass, and bandpass dictionary to perform on channels
    median_filter_kernel_size:NoneType=None, # if not none, will apply median filter with kernel size
    voltage_channels:NoneType=None, # if not None, these channels units will be looked at and changed to microvolts from mv uv etc.
    clip_interpolations:NoneType=None, # dictionary of channels:{'phys_range':..., 'percentiles':...} for filtering and interpolation of filtered values
    return_hyponogram:bool=True, # indicator to return the hypnogram, this could be used for a classification task, otherwise will return X,X
    normalize_signals:bool=True, # indicator to normalize signals
    constant_nan_tolerance:float=1.0, # tolerance for nan values in signals
    constant_channels:list=['SaO2', 'SpO2', 'spo2', 'SPO2'], # channels to check for constant values
    hypnogram_required_stages:list=[0, 1, 2, 3, 4], # hypnogram stages that must be present in a sample
    hypnogram_constant_tolerance:float=1.0, # tolerance for constant values in hypnogram
):

An abstract class representing a :class:Dataset.

All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite :meth:__getitem__, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite :meth:__len__, which is expected to return the size of the dataset by many :class:~torch.utils.data.Sampler implementations and the default options of :class:~torch.utils.data.DataLoader. Subclasses could also optionally implement :meth:__getitems__, for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.

.. note:: :class:~torch.utils.data.DataLoader by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.


source

padded_tensor_sequence_collate


def padded_tensor_sequence_collate(
    batch, max_seq_len, frequency, return_hypnogram:bool=False, hypnogram_frequency:int=1, X_pad_value:float=0.0,
    hypnogram_padding_mask:int=-100
):

Collate function for variable length sequences with padding.

Args: batch: List of tuples (X, Y) where: X: Tensor of shape (channels, seq_len) Y: Target tensor (channels, seq_len) or hypnogram

Returns: X: Tensor containing the batch of sequences Y: Tensor containing the batch of targets


source

nested_tensor_collate


def nested_tensor_collate(
    batch
):

Collate function for variable length sequences using NestedTensor.

Args: batch: List of tuples (X, Y) where: X: Tensor of shape (channels, seq_len) Y: single outcome (bs) idx: optional index of the sample in the dataset

Returns: X_nested: NestedTensor containing the batch of sequences Y: Tensor containing the batch of targets idx: optional tensor of indices


source

nested_tensor_sequence_collate


def nested_tensor_sequence_collate(
    batch
):

Collate function for variable length sequences using NestedTensor.

Args: batch: List of tuples (X, Y) where: X: Tensor of shape (channels, seq_len) Y: Target tensor (channels, seq_len) or hypnogram

Returns: X_nested: NestedTensor containing the batch of sequences Y: Tensor containing the batch of targets


source

nested_tensor_sequence_multi_label_collate


def nested_tensor_sequence_multi_label_collate(
    batch
):

Collate function for variable length sequences using NestedTensor.

Args: batch: List of tuples (X, Y) where: X: Tensor of shape (channels, seq_len) (nested tensor) Y: Tensor of shape (n_events) time: Tensor of shape (n_events)

Returns: X_nested: NestedTensor containing the batch of sequences Y: Tensor containing the batch of targets time: Tensor containing the batch of times


source

padded_tensor_sequence_multi_label_collate


def padded_tensor_sequence_multi_label_collate(
    batch, max_seq_len, X_pad_value:float=0.0
):

Collate function for variable length sequences using NestedTensor.

Args: batch: List of tuples (X, Y) where: X: Tensor of shape (channels, seq_len) (nested tensor) Y: Tensor of shape (n_events) time: Tensor of shape (n_events)

Returns: X_nested: NestedTensor containing the batch of sequences Y: Tensor containing the batch of targets time: Tensor containing the batch of times

Single Outcome Dataset


source

SingleOutcomeDataset


def SingleOutcomeDataset(
    zarr_files, # zarr files that include samples
    channels, # channels to use
    max_seq_len_sec, # maximum sequence length (in seconds) to use (this is especially relevant when you are returning both stft and raw ts data to keep them in sync)
    sample_seq_len_sec, # if no sample_df, generate sequences of this length in seconds as one sample
    sample_stride_sec, # if no sample_df, seconds of overlap for samples from the same array, if seq_len_seconds == overlap_seconds, there is no overlap
    y_outcome_df, # file path containing values for outcome of interest
    min_seq_len_sec:NoneType=None, # minimum sequence length (in seconds) to use
    trim_wake_epochs:bool=True, # indicator to trim wake epochs from hypnograms, if it is the largest class
    return_hypnogram_every_sec:int=30, # integer value indicating the step in indexing in seconds
    hypnogram_padding_mask:int=-100, # padded value to add to target and indice to ignore when computing loss
    hypnogram_frequency:int=125, # frequency of underlying y hypnogram data
    y_mapping_column:str='filepath', # column mapping corresponding outcome value to zarr file
    y_outcome:str='', # outcome column in the y file path
    y_time_column:NoneType=None, # column in the y file path that contains the time of the event or censored
    y_demographic_columns:NoneType=None, # list of demographic columns to return as part of the outcome
    y_demographic_norm_stats:dict={}, # dictionary of demographic columns to normalize with mean and std
    include_partial_samples:bool=True, # indicator to include data from partial samples when return_full_length is false
    sample_df:NoneType=None, # dataframe indicating which indices within each zarr file includes a sample
    start_offset_sec:int=0, # number of seconds to exclude from beginning of sleep studies
    frequency:int=128, # frequency of underlying data
    butterworth_filters:dict={'ECG': [None, 0.3], 'ECG (LL-RA)': [None, 0.3], 'EKG': [None, 0.3], 'ECG (L-R)': [None, 0.3], 'ECG2': [None, 0.3], 'EOG(L)': [0.3, 45], 'EOG-L': [0.3, 45], 'E1': [0.3, 45], 'LOC': [0.3, 45], 'E1-M2': [0.3, 45], 'E1-AVG': [0.3, 45], 'EMG': [None, 10], 'cchin_l': [None, 10], 'chin': [None, 10], 'EMG (L-R)': [None, 10], 'Chin 1-Chin 2': [None, 10], 'EMG (1-2)': [None, 10], 'EMG (1-3)': [None, 10], 'Chin3': [None, 10], 'cchin': [None, 10], 'C4-M1': [0.3, 45], 'C4_M1': [0.3, 45], 'EEG': [0.3, 45], 'EEG3': [0.3, 45], 'C3-M2': [0.3, 45], 'C3_M2': [0.3, 45], 'EEG(sec)': [0.3, 45], 'C4-AVG': [0.3, 45], 'C3-AVG': [0.3, 45], 'THOR RES': [0.1, 15], 'Thor': [0.1, 15], 'thorax': [0.1, 15], 'Thoracic': [0.1, 15], 'Chest': [0.1, 15], 'ABDO RES': [0.1, 15], 'abdomen': [0.1, 15], 'Abdo': [0.1, 15], 'Abdominal': [0.1, 15], 'ABD': [0.1, 15], 'Abd': [0.1, 15]}, # dictionary of low pass, high pass, and bandpass dictionary to perform on channels
    median_filter_kernel_size:NoneType=None, # if not none, will apply median filter with kernel size
    voltage_channels:NoneType=None, # if not None, these channels units will be looked at and changed to microvolts from mv uv etc.
    clip_interpolations:NoneType=None, # dictionary of channels:{'phys_range':..., 'percentiles':...} for filtering and interpolation of filtered values
    normalize_signals:bool=True, # indicator to normalize signals
    constant_nan_tolerance:float=1.0, # tolerance for nan values in signals
    constant_channels:list=['SaO2', 'SpO2', 'spo2', 'SPO2'], # channels to check for constant values
):

An abstract class representing a :class:Dataset.

All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite :meth:__getitem__, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite :meth:__len__, which is expected to return the size of the dataset by many :class:~torch.utils.data.Sampler implementations and the default options of :class:~torch.utils.data.DataLoader. Subclasses could also optionally implement :meth:__getitems__, for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.

.. note:: :class:~torch.utils.data.DataLoader by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.