Bedside
count_waveforms
def count_waveforms(
file_header_dir:list, # Directory path of the yaml header files to be counted.
)->Counter: # Counter dictionary of the waveforms
Function to count the waveforms in the yaml header files
count_units
def count_units(
file_headers:list, # Directory path of the yaml header files to be counted.
)->Counter: # Counter dictionary of the units
Function to count the units/icus in the yaml header files
chunks
def chunks(
lst:list, # list to chunk
n, # how many chunks to create
)->list: # chunked list
Yield successive n-sized chunks from lst.
error_callback_handler
def error_callback_handler(
error, # An error raised from an exception
):
Function to handle/print errors in multiproccessing async methods. =
parse_to_zarr
def parse_to_zarr(
file_path:str, # path of xml file to parse
file_header_directory:str, # path to file header directory
write_data_dir:str, # directory to write zarr files to
null_value:int=-32768, # null value to fill in segment discontinuities
resample_waves:dict={'I': (120, 240), 'II': (120, 240), 'III': (120, 240), 'V': (120, 240)}, # dictionary of waveform keys:tuple to resample the waveforms to
overwrite:bool=True, # boolean indicator to overwrite existing zarr files
waveforms:list=['I', 'II', 'III', 'V', 'RR', 'SPO2'], # list of waveforms to parse from xml
verbose:bool=False, # indicator to output verbose and use tqdm
):
Call self as a function.
/opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/site-packages/fastcore/docscrape.py:259: UserWarning: potentially wrong underline length...
Notes
---------- in
Function to create a file of the bedmaster metadata using the parse_bedmaster_header function.
...
else: warn(msg)
create_bedmaster_header_file
def create_bedmaster_header_file(
file_path:str, # File path of bedmaster xml file
header_dir:str=None, # Directory to write the file's header file
overwrite:bool=False, # Indicator to overwrite the header file, if it already exists in the header_dir
)->None:
Function to create a file of the bedmaster metadata using the parse_bedmaster_header function.
parse_bedmaster_header
def parse_bedmaster_header(
file_path:str, # File path of bedmaster xml file
d:dict={'Filename': '', 'Size': '', 'Unit': '', 'Bed': '', 'FamilyType': '', 'PatientNames': [], 'PatientIDs': [], 'TotalSegments': 0, 'WaveformStartTime': 9223372036854775807, 'WaveformEndTime': 0, 'VitalStartTime': 9223372036854775807, 'VitalEndTime': 0, 'WaveformData': {}, 'VitalSign': {}, 'TotalAlarms': 0}, # Template dictionary to input xml file details into
)->dict: # Dictionary of the file's metadata.
Function to iteratively parse a bedmaster xml file. This function will try to fix broken xml files via the recover=True parameter in iterparse. It is specific to bedmaster data files in the bedmaster directory at /ar/scion/projects/mscic1/data/bedmaster/bedmaster_decompressed/. This attempts to get file, patient, waveform, vital, and alarm metadata from a single xml file. Additional metadata could be extracted by adding additional cases to this function.
unix_to_iso
def unix_to_iso(
d:Union, # Unix timestamp to convert to isoformat
)->str: # String of isoformat datetime representation
Function to convert unix timestamps to isoformat
clean_dict
def clean_dict(
d:dict, # Dictionary of keys and values to unlist values in lists from
)->dict: # Dictionary with values that were length 1 lists as single elements
Recursive function to unlist length 1 lists in dictionary values
Dataloaders
ForecastingDataset
def ForecastingDataset(
channels, # channels to use
forecast_window_sec, # forecast window (within), suggest 5, 10, 15 minutes
outcome_df, # pandas dataframe containing outcomes for zarr files
outcome_df_outcome_col, # outcome column in the y file path
file_col:str='file_path', # column indicating zarr file path
y_date_column:str='date', # column indicating date of sample collection
outcome_df_seconds_since_column:str='Time Stamp (seconds)', # column indicating how many seconds since beginning of waveform
outcome_df_duration_column:str='event_length', # column indicating duration of outcome in seconds
sample_df:NoneType=None, # dataframe indicating which indices within each zarr file includes a sample
sample_seq_len_sec:NoneType=None, # if no sample_df, generate sequences of this length in seconds as one sample
frequency:int=125, # frequency of underlying data
butterworth_filters:NoneType=None, # dictionary of low pass, high pass, and bandpass dictionary to perform on channels
median_filter_kernel_size:NoneType=None, # size of median filter to perform on channels
clip_interpolations:NoneType=None, # dictionary of channels:{'phys_range':..., 'percentiles':...} for filtering and interpolation of filtered values
constant_nan_tolerance:float=0.5, # tolerance for nan values in the data - 0 means no nan allowed, 1 means 100% of nans allowed
require_all_channels:bool=False, # indicator to require all channels to be present in the sample, if False, will return samples with any of the channels and 0s for the missing channels
infer_forecast_windows:bool=True, # indicator to require all forecast windows to be present in the sample, if False, will return samples with any of the forecast windows and NAs for the missing forecast windows
normalize_signals:bool=True, # indicator to normalize signals to 0 mean and unit variance
sample_frequency_key:str='sampling_frequency'
):
An abstract class representing a :class:Dataset.
All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite :meth:__getitem__, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite :meth:__len__, which is expected to return the size of the dataset by many :class:~torch.utils.data.Sampler implementations and the default options of :class:~torch.utils.data.DataLoader. Subclasses could also optionally implement :meth:__getitems__, for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.
.. note:: :class:~torch.utils.data.DataLoader by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.
BedmasterForecastingDataset
def BedmasterForecastingDataset(
channels, # channels to use
forecast_window_sec, # forecast window (within), suggest 5, 10, 15 minutes
outcome_df, # pandas dataframe containing outcomes for zarr files
outcome_df_outcome_col, # outcome column in the y file path
file_col:str='file_path', # column indicating zarr file path
y_date_column:str='date', # column indicating date of sample collection
outcome_df_seconds_since_column:str='Time Stamp (seconds)', # column indicating how many seconds since beginning of waveform
outcome_df_duration_column:str='event_length', # column indicating duration of outcome in seconds
sample_df:NoneType=None, # dataframe indicating which indices within each zarr file includes a sample
sample_seq_len_sec:NoneType=None, # if no sample_df, generate sequences of this length in seconds as one sample
frequency:int=125, # frequency of underlying data
butterworth_filters:NoneType=None, # dictionary of low pass, high pass, and bandpass dictionary to perform on channels
median_filter_kernel_size:NoneType=None, # size of median filter to perform on channels
clip_interpolations:NoneType=None, # dictionary of channels:{'phys_range':..., 'percentiles':...} for filtering and interpolation of filtered values
constant_nan_tolerance:float=0.5, # tolerance for nan values in the data - 0 means no nan allowed, 1 means 100% of nans allowed
require_all_channels:bool=False, # indicator to require all channels to be present in the sample, if False, will return samples with any of the channels and 0s for the missing channels
infer_forecast_windows:bool=True, # indicator to require all forecast windows to be present in the sample, if False, will return samples with any of the forecast windows and NAs for the missing forecast windows
normalize_signals:bool=True, # indicator to normalize signals to 0 mean and unit variance
sample_frequency_key:str='sample_rate',
calibrations:dict={'ART1': 0.2, 'ART2': 0.2, 'I': 2.44, 'II': 2.44, 'III': 2.44, 'V': 2.44}
):
An abstract class representing a :class:Dataset.
All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite :meth:__getitem__, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite :meth:__len__, which is expected to return the size of the dataset by many :class:~torch.utils.data.Sampler implementations and the default options of :class:~torch.utils.data.DataLoader. Subclasses could also optionally implement :meth:__getitems__, for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.
.. note:: :class:~torch.utils.data.DataLoader by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.
BedmasterForecastingDatasetExtended
def BedmasterForecastingDatasetExtended(
channels, # channels to use
forecast_window_sec, # forecast window (within), suggest 5, 10, 15 minutes
outcome_df, # pandas dataframe containing outcomes for zarr files
outcome_df_outcome_col, # outcome column in the y file path
patch_heartbeats:bool=False, # indicator to patch channels based on heart beats of ECG index channel
ecg_index:int=2, # index of ECG channel in channels list
ecg_channel_name:str='II', # name of ECG channel in zarr file
fill_missing_beats:bool=False, # indicator to fill missing heart beats with interpolated values vs leaving as 0s
heartbeat_patch_len:int=256,
channel_processing_functions:NoneType=None, # dictionary of channel processing parameters
channel_quality_functions:NoneType=None, # dictionary of channel quality parameters
file_col:str='file_path', # column indicating zarr file path
y_date_column:str='date', # column indicating date of sample collection
outcome_df_seconds_since_column:str='Time Stamp (seconds)', # column indicating how many seconds since beginning of waveform
outcome_df_duration_column:str='event_length', # column indicating duration of outcome in seconds
sample_df:NoneType=None, # dataframe indicating which indices within each zarr file includes a sample
sample_seq_len_sec:NoneType=None, # if no sample_df, generate sequences of this length in seconds as one sample
frequency:int=125, # frequency of underlying data
clip_interpolations:NoneType=None, # dictionary of channels:{'phys_range':..., 'percentiles':...} for filtering and interpolation of filtered values
require_all_channels:bool=False, # indicator to require all channels to be present in the sample, if False, will return samples with any of the channels and 0s for the missing channels
infer_forecast_windows:bool=True, # indicator to require all forecast windows to be present in the sample, if False, will return samples with any of the forecast windows and NAs for the missing forecast windows
normalize_signals:bool=True, # indicator to normalize signals to 0 mean and unit variance
sample_frequency_key:str='sample_rate',
calibrations:dict={'ART1': 0.2, 'ART2': 0.2, 'I': 2.44, 'II': 2.44, 'III': 2.44, 'V': 2.44}
):
An abstract class representing a :class:Dataset.
All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite :meth:__getitem__, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite :meth:__len__, which is expected to return the size of the dataset by many :class:~torch.utils.data.Sampler implementations and the default options of :class:~torch.utils.data.DataLoader. Subclasses could also optionally implement :meth:__getitems__, for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.
.. note:: :class:~torch.utils.data.DataLoader by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.
SelfSupervisedDataset
def SelfSupervisedDataset(
zarr_files, # zarr files that include samples
channels, # channels to use
max_seq_len_sec:NoneType=None, # maximum sequence length (in seconds) to use (this is especially relevant when you are returning both stft and raw ts data to keep them in sync)
sample_df:NoneType=None, # dataframe indicating which indices within each zarr file includes a sample
sample_seq_len_sec:NoneType=None, # if no sample_df, generate sequences of this length in seconds as one sample
sample_stride_sec:NoneType=None, # if no sample_df, seconds of overlap for samples from the same array, if seq_len_seconds == overlap_seconds, there is no overlap
frequency:int=125, # frequency of underlying data
butterworth_filters:NoneType=None, # dictionary of low pass, high pass, and bandpass dictionary to perform on channels
median_filter_kernel_size:NoneType=None, # size of median filter to perform on channels
clip_interpolations:NoneType=None, # dictionary of channels:{'phys_range':..., 'percentiles':...} for filtering and interpolation of filtered values
constant_nan_tolerance:float=0.2, # tolerance for nan values in the data - 0 means no nan allowed, 1 means 100% of nans allowed
require_all_channels:bool=True, # indicator to require all channels to be present in the data
normalize_signals:bool=True, # indicator to normalize signals to 0 mean and unit variance
patch_heartbeats:bool=False, # indicator to patch channels based on heart beats of ECG index channel
ecg_index:int=2, # index of ECG channel in channels list
heartbeat_patch_len:int=256
):
An abstract class representing a :class:Dataset.
All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite :meth:__getitem__, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite :meth:__len__, which is expected to return the size of the dataset by many :class:~torch.utils.data.Sampler implementations and the default options of :class:~torch.utils.data.DataLoader. Subclasses could also optionally implement :meth:__getitems__, for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.
.. note:: :class:~torch.utils.data.DataLoader by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.
nested_patches_ss_tensor_collate
def nested_patches_ss_tensor_collate(
batch
):
Collate function for variable length sequences using NestedTensor.
Args: batch: List of tuples (X, Y) where: X: Tensor of shape (num_patches, channels, patch_len) Y: Tensor of shape (num_patches, channels, patch_len)
Returns: X_nested: NestedTensor containing the batch of sequences Y_nested: NestedTensor containing the batch of sequences
nested_patches_y_tensor_collate
def nested_patches_y_tensor_collate(
batch
):
Collate function for variable length sequences using NestedTensor.
Args: batch: List of tuples (X, Y) where: X: Tensor of shape (channels, seq_len) Y: single outcome (bs)
Returns: X_nested: NestedTensor containing the batch of sequences Y: Tensor containing the batch of targets