Slumber

I’m tired too. Here are some things that might help.

Constants

EDF Helpers

Read EDFs

read_edf

 read_edf (file_path, channels=None, frequency=None)

Function to read an edf file and return a list of signals and header with the option to resample to a passed frequency

	Type	Default	Details
file_path			file path of edf
channels	NoneType	None	channels in edf to read, will raise warning if channels do not exist
frequency	NoneType	None	frequency to resample all signals to

source

read_edf_mne

 read_edf_mne (file_path, channels=None, frequency=None)

function to read edf with mne library i dont recommend using this. Use edfio instead.

	Type	Default	Details
file_path			file path of edf
channels	NoneType	None	channels in edf to read, will raise warning if channels do not exist
frequency	NoneType	None	frequency to resample all signals to
Returns	Union		tuple of signals and header dictionary

source

read_edf_edfio

 read_edf_edfio (file_path, channels=None, frequency=None)

function to read edfs with edfio

	Type	Default	Details
file_path			file path of edf
channels	NoneType	None	channels in edf to read, will raise warning if channels do not exist
frequency	NoneType	None	frequency to resample all signals to
Returns	Union		tuple of signals and header dictionary

Read Hypnograms

source

read_hypnogram

 read_hypnogram (file, epoch_length=None)

Function that reads a hypnogram csv and returns a numpy array of the hypnogram with optional repeats

	Type	Default	Details
file			file path of the hypnogram csv
epoch_length	NoneType	None	epoch length of the hypnogram measurements, if passed will repeat this many times at each element
Returns	array		numpy array of hypnogram

EDFs to Zarr

source

edf_signals_to_zarr

 edf_signals_to_zarr (edf_file_path, write_data_dir, overwrite=False,
                      channels=None, channel_name_map=None,
                      frequency=None, hyp_epoch_length=30,
                      hyp_data_dir=None)

*Function that converts an edf to a zarr file

try_mne: tries to load files with mne instead of pyedflib (if there is an error). This seems dangerous as mne converts units (and potentially resamples, while pyedflib does not)*

Datasets

Self Supervised Dataset

source

trim_wake_epochs_from_signals

 trim_wake_epochs_from_signals (X, hypnogram, sequence_padding_mask,
                                resampled_hypnogram_length,
                                mask_x_with_zeros=False,
                                padding_mask=-100)

*Function to trim wake epochs (if wake is the largest class) from signals

X: bs, channels, seq_len hypnogram: bs, seq_len / resampled_hypnogram_length sequence_padding_mask: bs, seq_len*

source

trim_wake_epochs_from_hypnogram

 trim_wake_epochs_from_hypnogram (hypnogram, padding_mask=-100)

*Function to trim wake epochs (if wake is the largest class) from hypnograms This function trims the wake epochs from the beginning and/or end of the hypnogram

Adapted from Phan et al L-SeqSleepNet*

source

SelfSupervisedTimeFrequencyDataset

 SelfSupervisedTimeFrequencyDataset (zarr_files, channels,
                                     max_seq_len_sec, sample_seq_len_sec,
                                     sample_stride_sec,
                                     start_offset_sec=0,
                                     trim_wake_epochs=True,
                                     include_partial_samples=True,
                                     sample_df=None, frequency=125,
                                     return_hypnogram_every_sec=30,
                                     hypnogram_padding_mask=-100,
                                     hypnogram_frequency=125,
                                     butterworth_filters=None,
                                     median_filter_kernel_size=None,
                                     voltage_channels=['ECG', 'ECG (LL-
                                     RA)', 'EKG', 'ECG (L-R)', 'EOG(L)',
                                     'EOG-L', 'E1', 'LOC', 'E1-M2',
                                     'E1-AVG', 'EMG', 'cchin_l', 'chin',
                                     'EMG (L-R)', 'EMG (1-2)', 'EMG
                                     (1-3)', 'Chin3', 'C4-M1', 'C4_M1',
                                     'EEG', 'EEG1', 'EEG2', 'EEG3',
                                     'C3-M2', 'C3_M2', 'C4-AVG'],
                                     clip_interpolations=None,
                                     scale_channels=False,
                                     time_channel_scales=None,
                                     return_sequence_padding_mask=False)

*An abstract class representing a :class:Dataset.

All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite :meth:__getitem__, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite :meth:__len__, which is expected to return the size of the dataset by many :class:~torch.utils.data.Sampler implementations and the default options of :class:~torch.utils.data.DataLoader. Subclasses could also optionally implement :meth:__getitems__, for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.

.. note:: :class:~torch.utils.data.DataLoader by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.*

	Type	Default	Details
zarr_files			zarr files that include samples
channels			channels to use
max_seq_len_sec			maximum sequence length (in seconds) to use (this is especially relevant when you are returning both stft and raw ts data to keep them in sync)
sample_seq_len_sec			if no sample_df, generate sequences of this length in seconds as one sample
sample_stride_sec			if no sample_df, seconds of overlap for samples from the same array, if seq_len_seconds == overlap_seconds, there is no overlap
start_offset_sec	int	0	number of seconds to exclude from beginning of sleep studies
trim_wake_epochs	bool	True	indicator to trim wake epochs from hypnograms, if it is the largest class
include_partial_samples	bool	True	indicator to include data from partial samples when return_full_length is false
sample_df	NoneType	None	dataframe indicating which indices within each zarr file includes a sample
frequency	int	125	frequency of underlying data
return_hypnogram_every_sec	int	30	integer value indicating the step in indexing in seconds
hypnogram_padding_mask	int	-100	padded value to add to target and indice to ignore when computing loss
hypnogram_frequency	int	125	frequency of underlying y hypnogram data
butterworth_filters	NoneType	None	dictionary of low pass, high pass, and bandpass dictionary to perform on channels
median_filter_kernel_size	NoneType	None	if not none, will apply median filter with kernel size
voltage_channels	list	[‘ECG’, ‘ECG (LL-RA)’, ‘EKG’, ‘ECG (L-R)’, ‘EOG(L)’, ‘EOG-L’, ‘E1’, ‘LOC’, ‘E1-M2’, ‘E1-AVG’, ‘EMG’, ‘cchin_l’, ‘chin’, ‘EMG (L-R)’, ‘EMG (1-2)’, ‘EMG (1-3)’, ‘Chin3’, ‘C4-M1’, ‘C4_M1’, ‘EEG’, ‘EEG1’, ‘EEG2’, ‘EEG3’, ‘C3-M2’, ‘C3_M2’, ‘C4-AVG’]	if not None, these channels units will be looked at and changed to microvolts from mv uv etc.
clip_interpolations	NoneType	None	dictionary of channels:{‘phys_range’:…, ‘percentiles’:…} for filtering and interpolation of filtered values
scale_channels	bool	False	indicator to scale channels to the mean and std of the zarr files.
time_channel_scales	NoneType	None	dictionary of channel:mean and channel:std values for scaling. Should use training statistics
return_sequence_padding_mask	bool	False	indicator to return the key padding mask for attention masking

Sleep Stage Supervised Dataset

source

HypnogramTimeFrequencyDataset

 HypnogramTimeFrequencyDataset (zarr_files, channels, max_seq_len_sec,
                                sample_seq_len_sec, sample_stride_sec,
                                start_offset_sec=0, trim_wake_epochs=True,
                                include_partial_samples=True,
                                sample_df=None, frequency=125,
                                return_y_every_sec=30,
                                y_padding_mask=-100, y_frequency=125,
                                butterworth_filters=None,
                                median_filter_kernel_size=None,
                                voltage_channels=['ECG', 'ECG (LL-RA)',
                                'EKG', 'ECG (L-R)', 'EOG(L)', 'EOG-L',
                                'E1', 'LOC', 'E1-M2', 'E1-AVG', 'EMG',
                                'cchin_l', 'chin', 'EMG (L-R)', 'EMG
                                (1-2)', 'EMG (1-3)', 'Chin3', 'C4-M1',
                                'C4_M1', 'EEG', 'EEG1', 'EEG2', 'EEG3',
                                'C3-M2', 'C3_M2', 'C4-AVG'],
                                clip_interpolations=None,
                                scale_channels=False,
                                time_channel_scales=None,
                                return_sequence_padding_mask=False)

*An abstract class representing a :class:Dataset.

	Type	Default	Details
zarr_files			zarr files that include samples
channels			channels to use
max_seq_len_sec			maximum sequence length (in seconds) to use (this is especially relevant when you are returning both stft and raw ts data to keep them in sync)
sample_seq_len_sec			if no sample_df, generate sequences of this length in seconds as one sample
sample_stride_sec			if no sample_df, seconds of overlap for samples from the same array, if seq_len_seconds == overlap_seconds, there is no overlap
start_offset_sec	int	0	number of seconds to exclude from beginning of sleep studies
trim_wake_epochs	bool	True	indicator to trim wake epochs from hypnograms, if it is the largest class
include_partial_samples	bool	True	indicator to include data from partial samples when return_full_length is false
sample_df	NoneType	None	dataframe indicating which indices within each zarr file includes a sample
frequency	int	125	frequency of underlying data
return_y_every_sec	int	30	integer value indicating the step in indexing in seconds
y_padding_mask	int	-100	padded value to add to target and indice to ignore when computing loss
y_frequency	int	125	frequency of underlying y hypnogram data
butterworth_filters	NoneType	None	dictionary of low pass, high pass, and bandpass dictionary to perform on channels
median_filter_kernel_size	NoneType	None	if not none, will apply median filter with kernel size
voltage_channels	list	[‘ECG’, ‘ECG (LL-RA)’, ‘EKG’, ‘ECG (L-R)’, ‘EOG(L)’, ‘EOG-L’, ‘E1’, ‘LOC’, ‘E1-M2’, ‘E1-AVG’, ‘EMG’, ‘cchin_l’, ‘chin’, ‘EMG (L-R)’, ‘EMG (1-2)’, ‘EMG (1-3)’, ‘Chin3’, ‘C4-M1’, ‘C4_M1’, ‘EEG’, ‘EEG1’, ‘EEG2’, ‘EEG3’, ‘C3-M2’, ‘C3_M2’, ‘C4-AVG’]	if not None, these channels units will be looked at and changed to microvolts from mv uv etc.
clip_interpolations	NoneType	None	dictionary of channels:{‘phys_range’:…, ‘percentiles’:…} for filtering and interpolation of filtered values
scale_channels	bool	False	indicator to scale channels to the mean and std of the zarr files.
time_channel_scales	NoneType	None	dictionary of channel:mean and channel:std values for scaling. Should use training statistics
return_sequence_padding_mask	bool	False	indicator to return the key padding mask for attention masking

Plotting

source

plot_edf_signals

 plot_edf_signals (signals, signal_names, signal_comparisons=None,
                   use_resampler=False, normalize=False, title_text='',
                   colorscale=None)