Bedside

Processing utilities for bedside/ICU data

source

count_waveforms


def count_waveforms(
    file_header_dir:list, # Directory path of the yaml header files to be counted.
)->Counter: # Counter dictionary of the waveforms

Function to count the waveforms in the yaml header files


source

count_units


def count_units(
    file_headers:list, # Directory path of the yaml header files to be counted.
)->Counter: # Counter dictionary of the units

Function to count the units/icus in the yaml header files


source

chunks


def chunks(
    lst:list, # list to chunk
    n, # how many chunks to create
)->list: # chunked list

Yield successive n-sized chunks from lst.


source

error_callback_handler


def error_callback_handler(
    error, # An error raised from an exception
):

Function to handle/print errors in multiproccessing async methods. =


source

parse_to_zarr


def parse_to_zarr(
    file_path:str, # path of xml file to parse
    file_header_directory:str, # path to file header directory
    write_data_dir:str, # directory to write zarr files to
    null_value:int=-32768, # null value to fill in segment discontinuities
    resample_waves:dict={'I': (120, 240), 'II': (120, 240), 'III': (120, 240), 'V': (120, 240)}, # dictionary of waveform keys:tuple to resample the waveforms to
    overwrite:bool=True, # boolean indicator to overwrite existing zarr files
    waveforms:list=['I', 'II', 'III', 'V', 'RR', 'SPO2'], # list of waveforms to parse from xml
    verbose:bool=False, # indicator to output verbose and use tqdm
):

Call self as a function.

/opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/site-packages/fastcore/docscrape.py:259: UserWarning: potentially wrong underline length... 
Notes 
---------- in 
Function to create a file of the bedmaster metadata using the parse_bedmaster_header function. 
...
  else: warn(msg)

source

create_bedmaster_header_file


def create_bedmaster_header_file(
    file_path:str, # File path of bedmaster xml file
    header_dir:str=None, # Directory to write the file's header file
    overwrite:bool=False, # Indicator to overwrite the header file, if it already exists in the header_dir
)->None:

Function to create a file of the bedmaster metadata using the parse_bedmaster_header function.


source

parse_bedmaster_header


def parse_bedmaster_header(
    file_path:str, # File path of bedmaster xml file
    d:dict={'Filename': '', 'Size': '', 'Unit': '', 'Bed': '', 'FamilyType': '', 'PatientNames': [], 'PatientIDs': [], 'TotalSegments': 0, 'WaveformStartTime': 9223372036854775807, 'WaveformEndTime': 0, 'VitalStartTime': 9223372036854775807, 'VitalEndTime': 0, 'WaveformData': {}, 'VitalSign': {}, 'TotalAlarms': 0}, # Template dictionary to input xml file details into
)->dict: # Dictionary of the file's metadata.

Function to iteratively parse a bedmaster xml file. This function will try to fix broken xml files via the recover=True parameter in iterparse. It is specific to bedmaster data files in the bedmaster directory at /ar/scion/projects/mscic1/data/bedmaster/bedmaster_decompressed/. This attempts to get file, patient, waveform, vital, and alarm metadata from a single xml file. Additional metadata could be extracted by adding additional cases to this function.


source

unix_to_iso


def unix_to_iso(
    d:Union, # Unix timestamp to convert to isoformat
)->str: # String of isoformat datetime representation

Function to convert unix timestamps to isoformat


source

clean_dict


def clean_dict(
    d:dict, # Dictionary of keys and values to unlist values in lists from
)->dict: # Dictionary with values that were length 1 lists as single elements

Recursive function to unlist length 1 lists in dictionary values

Dataloaders


source

ForecastingDataset


def ForecastingDataset(
    channels, # channels to use
    forecast_window_sec, # forecast window (within), suggest 5, 10, 15 minutes
    outcome_df, # pandas dataframe containing outcomes for zarr files
    outcome_df_outcome_col, # outcome column in the y file path
    file_col:str='file_path', # column indicating zarr file path
    y_date_column:str='date', # column indicating date of sample collection
    outcome_df_seconds_since_column:str='Time Stamp (seconds)', # column indicating how many seconds since beginning of waveform
    outcome_df_duration_column:str='event_length', # column indicating duration of outcome in seconds
    sample_df:NoneType=None, # dataframe indicating which indices within each zarr file includes a sample
    sample_seq_len_sec:NoneType=None, # if no sample_df, generate sequences of this length in seconds as one sample
    frequency:int=125, # frequency of underlying data
    butterworth_filters:NoneType=None, # dictionary of low pass, high pass, and bandpass dictionary to perform on channels
    median_filter_kernel_size:NoneType=None, # size of median filter to perform on channels
    clip_interpolations:NoneType=None, # dictionary of channels:{'phys_range':..., 'percentiles':...} for filtering and interpolation of filtered values
    constant_nan_tolerance:float=0.5, # tolerance for nan values in the data - 0 means no nan allowed, 1 means 100% of nans allowed
    require_all_channels:bool=False, # indicator to require all channels to be present in the sample, if False, will return samples with any of the channels and 0s for the missing channels
    infer_forecast_windows:bool=True, # indicator to require all forecast windows to be present in the sample, if False, will return samples with any of the forecast windows and NAs for the missing forecast windows
    normalize_signals:bool=True, # indicator to normalize signals to 0 mean and unit variance
    sample_frequency_key:str='sampling_frequency'
):

An abstract class representing a :class:Dataset.

All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite :meth:__getitem__, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite :meth:__len__, which is expected to return the size of the dataset by many :class:~torch.utils.data.Sampler implementations and the default options of :class:~torch.utils.data.DataLoader. Subclasses could also optionally implement :meth:__getitems__, for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.

.. note:: :class:~torch.utils.data.DataLoader by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.


source

BedmasterForecastingDataset


def BedmasterForecastingDataset(
    channels, # channels to use
    forecast_window_sec, # forecast window (within), suggest 5, 10, 15 minutes
    outcome_df, # pandas dataframe containing outcomes for zarr files
    outcome_df_outcome_col, # outcome column in the y file path
    file_col:str='file_path', # column indicating zarr file path
    y_date_column:str='date', # column indicating date of sample collection
    outcome_df_seconds_since_column:str='Time Stamp (seconds)', # column indicating how many seconds since beginning of waveform
    outcome_df_duration_column:str='event_length', # column indicating duration of outcome in seconds
    sample_df:NoneType=None, # dataframe indicating which indices within each zarr file includes a sample
    sample_seq_len_sec:NoneType=None, # if no sample_df, generate sequences of this length in seconds as one sample
    frequency:int=125, # frequency of underlying data
    butterworth_filters:NoneType=None, # dictionary of low pass, high pass, and bandpass dictionary to perform on channels
    median_filter_kernel_size:NoneType=None, # size of median filter to perform on channels
    clip_interpolations:NoneType=None, # dictionary of channels:{'phys_range':..., 'percentiles':...} for filtering and interpolation of filtered values
    constant_nan_tolerance:float=0.5, # tolerance for nan values in the data - 0 means no nan allowed, 1 means 100% of nans allowed
    require_all_channels:bool=False, # indicator to require all channels to be present in the sample, if False, will return samples with any of the channels and 0s for the missing channels
    infer_forecast_windows:bool=True, # indicator to require all forecast windows to be present in the sample, if False, will return samples with any of the forecast windows and NAs for the missing forecast windows
    normalize_signals:bool=True, # indicator to normalize signals to 0 mean and unit variance
    sample_frequency_key:str='sample_rate',
    calibrations:dict={'ART1': 0.2, 'ART2': 0.2, 'I': 2.44, 'II': 2.44, 'III': 2.44, 'V': 2.44}
):

An abstract class representing a :class:Dataset.

All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite :meth:__getitem__, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite :meth:__len__, which is expected to return the size of the dataset by many :class:~torch.utils.data.Sampler implementations and the default options of :class:~torch.utils.data.DataLoader. Subclasses could also optionally implement :meth:__getitems__, for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.

.. note:: :class:~torch.utils.data.DataLoader by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.


source

BedmasterForecastingDatasetExtended


def BedmasterForecastingDatasetExtended(
    channels, # channels to use
    forecast_window_sec, # forecast window (within), suggest 5, 10, 15 minutes
    outcome_df, # pandas dataframe containing outcomes for zarr files
    outcome_df_outcome_col, # outcome column in the y file path
    patch_heartbeats:bool=False, # indicator to patch channels based on heart beats of ECG index channel
    ecg_index:int=2, # index of ECG channel in channels list
    ecg_channel_name:str='II', # name of ECG channel in zarr file
    fill_missing_beats:bool=False, # indicator to fill missing heart beats with interpolated values vs leaving as 0s
    heartbeat_patch_len:int=256,
    channel_processing_functions:NoneType=None, # dictionary of channel processing parameters
    channel_quality_functions:NoneType=None, # dictionary of channel quality parameters
    file_col:str='file_path', # column indicating zarr file path
    y_date_column:str='date', # column indicating date of sample collection
    outcome_df_seconds_since_column:str='Time Stamp (seconds)', # column indicating how many seconds since beginning of waveform
    outcome_df_duration_column:str='event_length', # column indicating duration of outcome in seconds
    sample_df:NoneType=None, # dataframe indicating which indices within each zarr file includes a sample
    sample_seq_len_sec:NoneType=None, # if no sample_df, generate sequences of this length in seconds as one sample
    frequency:int=125, # frequency of underlying data
    clip_interpolations:NoneType=None, # dictionary of channels:{'phys_range':..., 'percentiles':...} for filtering and interpolation of filtered values
    require_all_channels:bool=False, # indicator to require all channels to be present in the sample, if False, will return samples with any of the channels and 0s for the missing channels
    infer_forecast_windows:bool=True, # indicator to require all forecast windows to be present in the sample, if False, will return samples with any of the forecast windows and NAs for the missing forecast windows
    normalize_signals:bool=True, # indicator to normalize signals to 0 mean and unit variance
    sample_frequency_key:str='sample_rate',
    calibrations:dict={'ART1': 0.2, 'ART2': 0.2, 'I': 2.44, 'II': 2.44, 'III': 2.44, 'V': 2.44}
):

An abstract class representing a :class:Dataset.

All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite :meth:__getitem__, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite :meth:__len__, which is expected to return the size of the dataset by many :class:~torch.utils.data.Sampler implementations and the default options of :class:~torch.utils.data.DataLoader. Subclasses could also optionally implement :meth:__getitems__, for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.

.. note:: :class:~torch.utils.data.DataLoader by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.


source

SelfSupervisedDataset


def SelfSupervisedDataset(
    zarr_files, # zarr files that include samples
    channels, # channels to use
    max_seq_len_sec:NoneType=None, # maximum sequence length (in seconds) to use (this is especially relevant when you are returning both stft and raw ts data to keep them in sync)
    sample_df:NoneType=None, # dataframe indicating which indices within each zarr file includes a sample
    sample_seq_len_sec:NoneType=None, # if no sample_df, generate sequences of this length in seconds as one sample
    sample_stride_sec:NoneType=None, # if no sample_df, seconds of overlap for samples from the same array, if seq_len_seconds == overlap_seconds, there is no overlap
    frequency:int=125, # frequency of underlying data
    butterworth_filters:NoneType=None, # dictionary of low pass, high pass, and bandpass dictionary to perform on channels
    median_filter_kernel_size:NoneType=None, # size of median filter to perform on channels
    clip_interpolations:NoneType=None, # dictionary of channels:{'phys_range':..., 'percentiles':...} for filtering and interpolation of filtered values
    constant_nan_tolerance:float=0.2, # tolerance for nan values in the data - 0 means no nan allowed, 1 means 100% of nans allowed
    require_all_channels:bool=True, # indicator to require all channels to be present in the data
    normalize_signals:bool=True, # indicator to normalize signals to 0 mean and unit variance
    patch_heartbeats:bool=False, # indicator to patch channels based on heart beats of ECG index channel
    ecg_index:int=2, # index of ECG channel in channels list
    heartbeat_patch_len:int=256
):

An abstract class representing a :class:Dataset.

All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite :meth:__getitem__, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite :meth:__len__, which is expected to return the size of the dataset by many :class:~torch.utils.data.Sampler implementations and the default options of :class:~torch.utils.data.DataLoader. Subclasses could also optionally implement :meth:__getitems__, for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.

.. note:: :class:~torch.utils.data.DataLoader by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.


source

nested_patches_ss_tensor_collate


def nested_patches_ss_tensor_collate(
    batch
):

Collate function for variable length sequences using NestedTensor.

Args: batch: List of tuples (X, Y) where: X: Tensor of shape (num_patches, channels, patch_len) Y: Tensor of shape (num_patches, channels, patch_len)

Returns: X_nested: NestedTensor containing the batch of sequences Y_nested: NestedTensor containing the batch of sequences


source

nested_patches_y_tensor_collate


def nested_patches_y_tensor_collate(
    batch
):

Collate function for variable length sequences using NestedTensor.

Args: batch: List of tuples (X, Y) where: X: Tensor of shape (channels, seq_len) Y: single outcome (bs)

Returns: X_nested: NestedTensor containing the batch of sequences Y: Tensor containing the batch of targets