Introduction

From SICDB Doc
Jump to navigation Jump to search

The SICdb dataset provides insight to over 27 thousand intensive care admissions, including therapy and data of their preceding surgery. Data was collected between 2013 and 2021 from 4 of the intensive care units at the University Hospital Salzburg, having more than 3 thousand intensive care admissions per year on 41 beds. The dataset is deidentified and contains, amongst others, case information, laboratory, medication, monitor and respirator signal data.SICdb provides aggregated once-per-hour and highly granular once-per-minute data.


The dataset, version 1.0.7, includes data from more than 27,350 admissions to the Department of Anesthesiology and Intensive Care Medicine at the General Hospital Salzburg and Paracelsus Medical University.

Data Tables

The SICdb dataset consists of billions of data entries across 7 data tables. The main table, "cases," contains a single entry for each intensive care admission and includes information about the patient (such as age, weight, and sex) and case details (such as diagnosis, scores, and ICD10 codes). The "TimeOfStay" field indicates the time from the first admission to a Metavision-enabled ward to the final closing of the case, including any preceding surgery. The "OffsetOfDeath" field indicates survival in seconds from admission. This data represents in-hospital mortality and may extend beyond the length of the intensive care stay or the current hospital stay. Personal data such as age, weight, and height have been grouped into bins of 5, with ages over 90 placed in the final bin.

All other data tables are related to the "cases" table through the "CaseID" field. Most data has timing information, with the "offset" field indicating the number of seconds from admission to the time of the event. The "laboratory" table contains laboratory values and the "medication" table provides data on administered drugs. There are several generic data tables that contain data sorted by type. The "data_ref" table contains additional nominal/categorical data, one entry per admission, and the "data_range" table documents items with a start and end time, such as data on central lines or drainages. The "data_float_h" table contains float data, aggregated once per hour, and includes most signal data. To reduce table size, minute data is serialized as a stream of IEEE 754 floats in the "data_float_h.rawdata" field. For further instructions on using this data, see the documentation found in Documentation.pdf or online [7], and refer to the unpacking script example provided on our GitHub repository [8].

Reference Table

Nominal data is encoded, and the reference table "d_references" provides additional information about the associated field. The referenced fields in all data tables correspond to the primary key of the "d_references" table, "ReferenceGlobalID." The "ReferenceValue" field in "d_references" gives the variable's value, and the "ReferenceUnit" field holds the unit or measurement, if applicable.

Data Format

GZip-compressed RFC 4180 comma-separated files are provided. The most current documentation, including table schemas, can be found online , and an offline copy is included in the files under the name Documentation.pdf. A GitHub repository has been created to share code, report issues, and discuss the dataset.