Slumber
Constants
EDF Helpers
Read EDFs
read_edf
def read_edf(
file_path, # file path of edf
channels:NoneType=None, # channels in edf to read, will raise warning if channels do not exist
frequency:NoneType=None, # frequency to resample all signals to
)->Union: # tuple of signals and header dictionary
Function to read an edf file and return a list of signals and header with the option to resample to a passed frequency
read_edf_mne
def read_edf_mne(
file_path, # file path of edf
channels:NoneType=None, # channels in edf to read, will raise warning if channels do not exist
frequency:NoneType=None, # frequency to resample all signals to
)->Union: # tuple of signals and header dictionary
function to read edf with mne library i dont recommend using this. Use edfio instead.
read_edf_edfio
def read_edf_edfio(
file_path, # file path of edf
channels:NoneType=None, # channels in edf to read, will raise warning if channels do not exist
frequency:NoneType=None, # frequency to resample all signals to
)->Union: # tuple of signals and header dictionary
function to read edfs with edfio
Read Hypnograms
read_hypnogram
def read_hypnogram(
file, # file path of the hypnogram csv
epoch_length:NoneType=None, # epoch length of the hypnogram measurements, if passed will repeat this many times at each element
)->array: # numpy array of hypnogram
Function that reads a hypnogram csv and returns a numpy array of the hypnogram with optional repeats
EDFs to Zarr
edf_signals_to_zarr
def edf_signals_to_zarr(
edf_file_path, write_data_dir, overwrite:bool=False, channels:NoneType=None, channel_name_map:NoneType=None,
frequency:NoneType=None, hyp_epoch_length:int=30, hyp_data_dir:NoneType=None
):
Function that converts an edf to a zarr file
try_mne: tries to load files with mne instead of pyedflib (if there is an error). This seems dangerous as mne converts units (and potentially resamples, while pyedflib does not)
Datasets
Self Supervised Dataset
remove_wake_epochs_from_signals
def remove_wake_epochs_from_signals(
X, hypnogram, resampled_hypnogram_length, padding_mask:int=-100
):
Function to trim wake epochs (if wake is the largest class) from signals
X: bs, channels, seq_len hypnogram: bs, seq_len / resampled_hypnogram_length sequence_padding_mask: bs, seq_len
trim_wake_epochs_from_hypnogram
def trim_wake_epochs_from_hypnogram(
hypnogram, padding_mask:int=-100
):
Function to trim wake epochs (if wake is the largest class) from hypnograms This function trims the wake epochs from the beginning and/or end of the hypnogram
Adapted from Phan et al L-SeqSleepNet
Sleep Stage Self Supervised Dataset
SelfSupervisedHypnogramTimeDataset
def SelfSupervisedHypnogramTimeDataset(
zarr_files, # zarr files that include samples
channels, # channels to use
max_seq_len_sec, # maximum sequence length (in seconds) to use (this is especially relevant when you are returning both stft and raw ts data to keep them in sync)
sample_seq_len_sec, # if no sample_df, generate sequences of this length in seconds as one sample
sample_stride_sec, # if no sample_df, seconds of overlap for samples from the same array, if seq_len_seconds == overlap_seconds, there is no overlap
frequency, # frequency of underlying data
min_seq_len_sec:NoneType=None, # minimum sequence length (in seconds) to use
start_offset_sec:int=0, # number of seconds to exclude from beginning of sleep studies
trim_wake_epochs:bool=True, # indicator to trim wake epochs from hypnograms, if it is the largest class
include_partial_samples:bool=True, # indicator to include data from partial samples when return_full_length is false
sample_df:NoneType=None, # dataframe indicating which indices within each zarr file includes a sample
return_hypnogram_every_sec:int=30, # integer value indicating the step in indexing in seconds
hypnogram_padding_mask:int=-100, # padded value to add to target and indice to ignore when computing loss
hypnogram_frequency:int=1, # frequency of underlying y hypnogram data
butterworth_filters:dict={'ECG': [None, 0.3], 'ECG (LL-RA)': [None, 0.3], 'EKG': [None, 0.3], 'ECG (L-R)': [None, 0.3], 'ECG2': [None, 0.3], 'EOG(L)': [0.3, 45], 'EOG-L': [0.3, 45], 'E1': [0.3, 45], 'LOC': [0.3, 45], 'E1-M2': [0.3, 45], 'E1-AVG': [0.3, 45], 'EMG': [None, 10], 'cchin_l': [None, 10], 'chin': [None, 10], 'EMG (L-R)': [None, 10], 'Chin 1-Chin 2': [None, 10], 'EMG (1-2)': [None, 10], 'EMG (1-3)': [None, 10], 'Chin3': [None, 10], 'cchin': [None, 10], 'C4-M1': [0.3, 45], 'C4_M1': [0.3, 45], 'EEG': [0.3, 45], 'EEG3': [0.3, 45], 'C3-M2': [0.3, 45], 'C3_M2': [0.3, 45], 'EEG(sec)': [0.3, 45], 'C4-AVG': [0.3, 45], 'C3-AVG': [0.3, 45], 'THOR RES': [0.1, 15], 'Thor': [0.1, 15], 'thorax': [0.1, 15], 'Thoracic': [0.1, 15], 'Chest': [0.1, 15], 'ABDO RES': [0.1, 15], 'abdomen': [0.1, 15], 'Abdo': [0.1, 15], 'Abdominal': [0.1, 15], 'ABD': [0.1, 15], 'Abd': [0.1, 15]}, # dictionary of low pass, high pass, and bandpass dictionary to perform on channels
median_filter_kernel_size:NoneType=None, # if not none, will apply median filter with kernel size
voltage_channels:NoneType=None, # if not None, these channels units will be looked at and changed to microvolts from mv uv etc.
clip_interpolations:NoneType=None, # dictionary of channels:{'phys_range':..., 'percentiles':...} for filtering and interpolation of filtered values
return_hyponogram:bool=True, # indicator to return the hypnogram, this could be used for a classification task, otherwise will return X,X
normalize_signals:bool=True, # indicator to normalize signals
constant_nan_tolerance:float=1.0, # tolerance for nan values in signals
constant_channels:list=['SaO2', 'SpO2', 'spo2', 'SPO2'], # channels to check for constant values
hypnogram_required_stages:list=[0, 1, 2, 3, 4], # hypnogram stages that must be present in a sample
hypnogram_constant_tolerance:float=1.0, # tolerance for constant values in hypnogram
):
An abstract class representing a :class:Dataset.
All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite :meth:__getitem__, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite :meth:__len__, which is expected to return the size of the dataset by many :class:~torch.utils.data.Sampler implementations and the default options of :class:~torch.utils.data.DataLoader. Subclasses could also optionally implement :meth:__getitems__, for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.
.. note:: :class:~torch.utils.data.DataLoader by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.
padded_tensor_sequence_collate
def padded_tensor_sequence_collate(
batch, max_seq_len, frequency, return_hypnogram:bool=False, hypnogram_frequency:int=1, X_pad_value:float=0.0,
hypnogram_padding_mask:int=-100
):
Collate function for variable length sequences with padding.
Args: batch: List of tuples (X, Y) where: X: Tensor of shape (channels, seq_len) Y: Target tensor (channels, seq_len) or hypnogram
Returns: X: Tensor containing the batch of sequences Y: Tensor containing the batch of targets
nested_tensor_collate
def nested_tensor_collate(
batch
):
Collate function for variable length sequences using NestedTensor.
Args: batch: List of tuples (X, Y) where: X: Tensor of shape (channels, seq_len) Y: single outcome (bs) idx: optional index of the sample in the dataset
Returns: X_nested: NestedTensor containing the batch of sequences Y: Tensor containing the batch of targets idx: optional tensor of indices
nested_tensor_sequence_collate
def nested_tensor_sequence_collate(
batch
):
Collate function for variable length sequences using NestedTensor.
Args: batch: List of tuples (X, Y) where: X: Tensor of shape (channels, seq_len) Y: Target tensor (channels, seq_len) or hypnogram
Returns: X_nested: NestedTensor containing the batch of sequences Y: Tensor containing the batch of targets
nested_tensor_sequence_multi_label_collate
def nested_tensor_sequence_multi_label_collate(
batch
):
Collate function for variable length sequences using NestedTensor.
Args: batch: List of tuples (X, Y) where: X: Tensor of shape (channels, seq_len) (nested tensor) Y: Tensor of shape (n_events) time: Tensor of shape (n_events)
Returns: X_nested: NestedTensor containing the batch of sequences Y: Tensor containing the batch of targets time: Tensor containing the batch of times
padded_tensor_sequence_multi_label_collate
def padded_tensor_sequence_multi_label_collate(
batch, max_seq_len, X_pad_value:float=0.0
):
Collate function for variable length sequences using NestedTensor.
Args: batch: List of tuples (X, Y) where: X: Tensor of shape (channels, seq_len) (nested tensor) Y: Tensor of shape (n_events) time: Tensor of shape (n_events)
Returns: X_nested: NestedTensor containing the batch of sequences Y: Tensor containing the batch of targets time: Tensor containing the batch of times
Single Outcome (EDS, CV) Dataset
SingleOutcomeDataset
def SingleOutcomeDataset(
zarr_files, # zarr files that include samples
channels, # channels to use
max_seq_len_sec, # maximum sequence length (in seconds) to use (this is especially relevant when you are returning both stft and raw ts data to keep them in sync)
sample_seq_len_sec, # if no sample_df, generate sequences of this length in seconds as one sample
sample_stride_sec, # if no sample_df, seconds of overlap for samples from the same array, if seq_len_seconds == overlap_seconds, there is no overlap
y_outcome_df, # file path containing values for outcome of interest
min_seq_len_sec:NoneType=None, # minimum sequence length (in seconds) to use
trim_wake_epochs:bool=True, # indicator to trim wake epochs from hypnograms, if it is the largest class
return_hypnogram_every_sec:int=30, # integer value indicating the step in indexing in seconds
hypnogram_padding_mask:int=-100, # padded value to add to target and indice to ignore when computing loss
hypnogram_frequency:int=125, # frequency of underlying y hypnogram data
y_mapping_column:str='filepath', # column mapping corresponding outcome value to zarr file
y_outcome:str='eds', # outcome column in the y file path
y_time_column:NoneType=None, # column in the y file path that contains the time of the event or censored
y_demographic_columns:NoneType=None, # list of demographic columns to return as part of the outcome
y_demographic_norm_stats:dict={}, # dictionary of demographic columns to normalize with mean and std
include_partial_samples:bool=True, # indicator to include data from partial samples when return_full_length is false
sample_df:NoneType=None, # dataframe indicating which indices within each zarr file includes a sample
start_offset_sec:int=0, # number of seconds to exclude from beginning of sleep studies
frequency:int=128, # frequency of underlying data
butterworth_filters:dict={'ECG': [None, 0.3], 'ECG (LL-RA)': [None, 0.3], 'EKG': [None, 0.3], 'ECG (L-R)': [None, 0.3], 'ECG2': [None, 0.3], 'EOG(L)': [0.3, 45], 'EOG-L': [0.3, 45], 'E1': [0.3, 45], 'LOC': [0.3, 45], 'E1-M2': [0.3, 45], 'E1-AVG': [0.3, 45], 'EMG': [None, 10], 'cchin_l': [None, 10], 'chin': [None, 10], 'EMG (L-R)': [None, 10], 'Chin 1-Chin 2': [None, 10], 'EMG (1-2)': [None, 10], 'EMG (1-3)': [None, 10], 'Chin3': [None, 10], 'cchin': [None, 10], 'C4-M1': [0.3, 45], 'C4_M1': [0.3, 45], 'EEG': [0.3, 45], 'EEG3': [0.3, 45], 'C3-M2': [0.3, 45], 'C3_M2': [0.3, 45], 'EEG(sec)': [0.3, 45], 'C4-AVG': [0.3, 45], 'C3-AVG': [0.3, 45], 'THOR RES': [0.1, 15], 'Thor': [0.1, 15], 'thorax': [0.1, 15], 'Thoracic': [0.1, 15], 'Chest': [0.1, 15], 'ABDO RES': [0.1, 15], 'abdomen': [0.1, 15], 'Abdo': [0.1, 15], 'Abdominal': [0.1, 15], 'ABD': [0.1, 15], 'Abd': [0.1, 15]}, # dictionary of low pass, high pass, and bandpass dictionary to perform on channels
median_filter_kernel_size:NoneType=None, # if not none, will apply median filter with kernel size
voltage_channels:NoneType=None, # if not None, these channels units will be looked at and changed to microvolts from mv uv etc.
clip_interpolations:NoneType=None, # dictionary of channels:{'phys_range':..., 'percentiles':...} for filtering and interpolation of filtered values
normalize_signals:bool=True, # indicator to normalize signals
constant_nan_tolerance:float=1.0, # tolerance for nan values in signals
constant_channels:list=['SaO2', 'SpO2', 'spo2', 'SPO2'], # channels to check for constant values
):
An abstract class representing a :class:Dataset.
All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite :meth:__getitem__, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite :meth:__len__, which is expected to return the size of the dataset by many :class:~torch.utils.data.Sampler implementations and the default options of :class:~torch.utils.data.DataLoader. Subclasses could also optionally implement :meth:__getitems__, for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.
.. note:: :class:~torch.utils.data.DataLoader by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.
SingleOutcomeHypnogramDataset
def SingleOutcomeHypnogramDataset(
zarr_files, # zarr files that include samples
channels, # channels to use
max_seq_len_sec, # maximum sequence length (in seconds) to use (this is especially relevant when you are returning both stft and raw ts data to keep them in sync)
sample_seq_len_sec, # if no sample_df, generate sequences of this length in seconds as one sample
sample_stride_sec, # if no sample_df, seconds of overlap for samples from the same array, if seq_len_seconds == overlap_seconds, there is no overlap
y_outcome_df, # file path containing values for outcome of interest
min_seq_len_sec:NoneType=None, # minimum sequence length (in seconds) to use
pad:bool=True,
hypnogram_padding_mask:int=-100, # padded value to add to target and indice to ignore when computing loss
hypnogram_frequency:int=125, # frequency of underlying y hypnogram data
y_mapping_column:str='filepath', # column mapping corresponding outcome value to zarr file
y_outcome:str='eds', # outcome column in the y file path
y_time_column:NoneType=None, # column in the y file path that contains the time of the event or censored
include_partial_samples:bool=True, # indicator to include data from partial samples when return_full_length is false
sample_df:NoneType=None, # dataframe indicating which indices within each zarr file includes a sample
start_offset_sec:int=0, # number of seconds to exclude from beginning of sleep studies
frequency:int=128, # frequency of underlying data
butterworth_filters:dict={'ECG': [None, 0.3], 'ECG (LL-RA)': [None, 0.3], 'EKG': [None, 0.3], 'ECG (L-R)': [None, 0.3], 'ECG2': [None, 0.3], 'EOG(L)': [0.3, 45], 'EOG-L': [0.3, 45], 'E1': [0.3, 45], 'LOC': [0.3, 45], 'E1-M2': [0.3, 45], 'E1-AVG': [0.3, 45], 'EMG': [None, 10], 'cchin_l': [None, 10], 'chin': [None, 10], 'EMG (L-R)': [None, 10], 'Chin 1-Chin 2': [None, 10], 'EMG (1-2)': [None, 10], 'EMG (1-3)': [None, 10], 'Chin3': [None, 10], 'cchin': [None, 10], 'C4-M1': [0.3, 45], 'C4_M1': [0.3, 45], 'EEG': [0.3, 45], 'EEG3': [0.3, 45], 'C3-M2': [0.3, 45], 'C3_M2': [0.3, 45], 'EEG(sec)': [0.3, 45], 'C4-AVG': [0.3, 45], 'C3-AVG': [0.3, 45], 'THOR RES': [0.1, 15], 'Thor': [0.1, 15], 'thorax': [0.1, 15], 'Thoracic': [0.1, 15], 'Chest': [0.1, 15], 'ABDO RES': [0.1, 15], 'abdomen': [0.1, 15], 'Abdo': [0.1, 15], 'Abdominal': [0.1, 15], 'ABD': [0.1, 15], 'Abd': [0.1, 15]}, # dictionary of low pass, high pass, and bandpass dictionary to perform on channels
median_filter_kernel_size:NoneType=None, # if not none, will apply median filter with kernel size
voltage_channels:NoneType=None, # if not None, these channels units will be looked at and changed to microvolts from mv uv etc.
clip_interpolations:NoneType=None, # dictionary of channels:{'phys_range':..., 'percentiles':...} for filtering and interpolation of filtered values
normalize_signals:bool=True, # indicator to normalize signals
constant_nan_tolerance:float=1.0, # tolerance for nan values in signals
constant_channels:list=['SaO2', 'SpO2', 'spo2', 'SPO2'], # channels to check for constant values
):
An abstract class representing a :class:Dataset.
All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite :meth:__getitem__, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite :meth:__len__, which is expected to return the size of the dataset by many :class:~torch.utils.data.Sampler implementations and the default options of :class:~torch.utils.data.DataLoader. Subclasses could also optionally implement :meth:__getitems__, for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.
.. note:: :class:~torch.utils.data.DataLoader by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.
Plotting
animated_plotly_line_chart
def animated_plotly_line_chart(
X, Y, frames:int=100
):
Call self as a function.
reset_plot_state
def reset_plot_state(
):
Call self as a function.
update_plot_state
def update_plot_state(
):
Call self as a function.
plot_edf_signals
def plot_edf_signals(
signals, signal_names, signal_comparisons:NoneType=None, use_resampler:bool=False, normalize:bool=False,
title_text:str='', colorscale:NoneType=None
):
Call self as a function.
get_hyp_data
def get_hyp_data(
path
):
Call self as a function.
transform_hyp_to_df
def transform_hyp_to_df(
json_ex
):
Call self as a function.