Slumber

I’m tired too. Here are some things that might help.

Constants

EDF Helpers

Read EDFs


source

read_edf


def read_edf(
    file_path, # file path of edf
    channels:NoneType=None, # channels in edf to read, will raise warning if channels do not exist
    frequency:NoneType=None, # frequency to resample all signals to
):

Function to read an edf file and return a list of signals and header with the option to resample to a passed frequency


source

read_edf_mne


def read_edf_mne(
    file_path, # file path of edf
    channels:NoneType=None, # channels in edf to read, will raise warning if channels do not exist
    frequency:NoneType=None, # frequency to resample all signals to
)->Union: # tuple of signals and header dictionary

function to read edf with mne library i dont recommend using this. Use edfio instead.


source

read_edf_edfio


def read_edf_edfio(
    file_path, # file path of edf
    channels:NoneType=None, # channels in edf to read, will raise warning if channels do not exist
    frequency:NoneType=None, # frequency to resample all signals to
)->Union: # tuple of signals and header dictionary

function to read edfs with edfio

Read Hypnograms


source

read_hypnogram


def read_hypnogram(
    file, # file path of the hypnogram csv
    epoch_length:NoneType=None, # epoch length of the hypnogram measurements, if passed will repeat this many times at each element
)->array: # numpy array of hypnogram

Function that reads a hypnogram csv and returns a numpy array of the hypnogram with optional repeats

EDFs to Zarr


source

edf_signals_to_zarr


def edf_signals_to_zarr(
    edf_file_path, write_data_dir, overwrite:bool=False, channels:NoneType=None, channel_name_map:NoneType=None,
    frequency:NoneType=None, hyp_epoch_length:int=30, hyp_data_dir:NoneType=None
):

Function that converts an edf to a zarr file

try_mne: tries to load files with mne instead of pyedflib (if there is an error). This seems dangerous as mne converts units (and potentially resamples, while pyedflib does not)

Datasets

Self Supervised Dataset


source

trim_wake_epochs_from_signals


def trim_wake_epochs_from_signals(
    X, hypnogram, sequence_padding_mask, resampled_hypnogram_length, mask_x_with_zeros:bool=False,
    padding_mask:int=-100
):

Function to trim wake epochs (if wake is the largest class) from signals

X: bs, channels, seq_len hypnogram: bs, seq_len / resampled_hypnogram_length sequence_padding_mask: bs, seq_len


source

trim_wake_epochs_from_hypnogram


def trim_wake_epochs_from_hypnogram(
    hypnogram, padding_mask:int=-100
):

Function to trim wake epochs (if wake is the largest class) from hypnograms This function trims the wake epochs from the beginning and/or end of the hypnogram

Adapted from Phan et al L-SeqSleepNet


source

SelfSupervisedTimeFrequencyDataset


def SelfSupervisedTimeFrequencyDataset(
    zarr_files, # zarr files that include samples
    channels, # channels to use
    max_seq_len_sec, # maximum sequence length (in seconds) to use (this is especially relevant when you are returning both stft and raw ts data to keep them in sync)
    sample_seq_len_sec, # if no sample_df, generate sequences of this length in seconds as one sample
    sample_stride_sec, # if no sample_df, seconds of overlap for samples from the same array, if seq_len_seconds == overlap_seconds, there is no overlap
    start_offset_sec:int=0, # number of seconds to exclude from beginning of sleep studies
    trim_wake_epochs:bool=True, # indicator to trim wake epochs from hypnograms, if it is the largest class
    include_partial_samples:bool=True, # indicator to include data from partial samples when return_full_length is false
    sample_df:NoneType=None, # dataframe indicating which indices within each zarr file includes a sample
    frequency:int=125, # frequency of underlying data
    return_hypnogram_every_sec:int=30, # integer value indicating the step in indexing in seconds
    hypnogram_padding_mask:int=-100, # padded value to add to target and indice to ignore when computing loss
    hypnogram_frequency:int=125, # frequency of underlying y hypnogram data
    butterworth_filters:NoneType=None, # dictionary of low pass, high pass, and bandpass dictionary to perform on channels
    median_filter_kernel_size:NoneType=None, # if not none, will apply median filter with kernel size
    voltage_channels:list=['ECG', 'ECG (LL-RA)', 'EKG', 'ECG (L-R)', 'EOG(L)', 'EOG-L', 'E1', 'LOC', 'E1-M2', 'E1-AVG', 'EMG', 'cchin_l', 'chin', 'EMG (L-R)', 'EMG (1-2)', 'EMG (1-3)', 'Chin3', 'C4-M1', 'C4_M1', 'EEG', 'EEG1', 'EEG2', 'EEG3', 'C3-M2', 'C3_M2', 'C4-AVG'], # if not None, these channels units will be looked at and changed to microvolts from mv uv etc.
    clip_interpolations:NoneType=None, # dictionary of channels:{'phys_range':..., 'percentiles':...} for filtering and interpolation of filtered values
    scale_channels:bool=False, # indicator to scale channels to the mean and std of the zarr files.
    time_channel_scales:NoneType=None, # dictionary of channel:mean and channel:std values for scaling. Should use training statistics
    return_sequence_padding_mask:bool=False, # indicator to return the key padding mask for attention masking
):

An abstract class representing a :class:Dataset.

All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite :meth:__getitem__, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite :meth:__len__, which is expected to return the size of the dataset by many :class:~torch.utils.data.Sampler implementations and the default options of :class:~torch.utils.data.DataLoader. Subclasses could also optionally implement :meth:__getitems__, for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.

.. note:: :class:~torch.utils.data.DataLoader by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.

Sleep Stage Supervised Dataset


source

HypnogramTimeFrequencyDataset


def HypnogramTimeFrequencyDataset(
    zarr_files, # zarr files that include samples
    channels, # channels to use
    max_seq_len_sec, # maximum sequence length (in seconds) to use (this is especially relevant when you are returning both stft and raw ts data to keep them in sync)
    sample_seq_len_sec, # if no sample_df, generate sequences of this length in seconds as one sample
    sample_stride_sec, # if no sample_df, seconds of overlap for samples from the same array, if seq_len_seconds == overlap_seconds, there is no overlap
    start_offset_sec:int=0, # number of seconds to exclude from beginning of sleep studies
    trim_wake_epochs:bool=True, # indicator to trim wake epochs from hypnograms, if it is the largest class
    include_partial_samples:bool=True, # indicator to include data from partial samples when return_full_length is false
    sample_df:NoneType=None, # dataframe indicating which indices within each zarr file includes a sample
    frequency:int=125, # frequency of underlying data
    return_y_every_sec:int=30, # integer value indicating the step in indexing in seconds
    y_padding_mask:int=-100, # padded value to add to target and indice to ignore when computing loss
    y_frequency:int=125, # frequency of underlying y hypnogram data
    butterworth_filters:NoneType=None, # dictionary of low pass, high pass, and bandpass dictionary to perform on channels
    median_filter_kernel_size:NoneType=None, # if not none, will apply median filter with kernel size
    voltage_channels:list=['ECG', 'ECG (LL-RA)', 'EKG', 'ECG (L-R)', 'EOG(L)', 'EOG-L', 'E1', 'LOC', 'E1-M2', 'E1-AVG', 'EMG', 'cchin_l', 'chin', 'EMG (L-R)', 'EMG (1-2)', 'EMG (1-3)', 'Chin3', 'C4-M1', 'C4_M1', 'EEG', 'EEG1', 'EEG2', 'EEG3', 'C3-M2', 'C3_M2', 'C4-AVG'], # if not None, these channels units will be looked at and changed to microvolts from mv uv etc.
    clip_interpolations:NoneType=None, # dictionary of channels:{'phys_range':..., 'percentiles':...} for filtering and interpolation of filtered values
    scale_channels:bool=False, # indicator to scale channels to the mean and std of the zarr files.
    time_channel_scales:NoneType=None, # dictionary of channel:mean and channel:std values for scaling. Should use training statistics
    return_sequence_padding_mask:bool=False, # indicator to return the key padding mask for attention masking
):

An abstract class representing a :class:Dataset.

All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite :meth:__getitem__, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite :meth:__len__, which is expected to return the size of the dataset by many :class:~torch.utils.data.Sampler implementations and the default options of :class:~torch.utils.data.DataLoader. Subclasses could also optionally implement :meth:__getitems__, for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.

.. note:: :class:~torch.utils.data.DataLoader by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.

Plotting


source

plot_edf_signals


def plot_edf_signals(
    signals, signal_names, signal_comparisons:NoneType=None, use_resampler:bool=False, normalize:bool=False,
    title_text:str='', colorscale:NoneType=None
):