Main Page

From SICDB Doc
Jump to navigation Jump to search

SICdb dataset, the documentation and our Software Package are, as of 05/23, in active development. We try to improve our project as fast as we can! Please contact us for every problem with data or software you find! We'd love to find some motivated researchers attain as much as possible from our dataset. Contact us if you face any issues!

Welcome to SICdb Documentation

The SICdb dataset provides insight to over 27 thousand intensive care admissions, including therapy and data of their preceding surgery. Data was collected between 2013 and 2021 from 4 of the intensive care units at the University Hospital Salzburg, having more than 3 thousand intensive care admissions per year on 41 beds. The dataset is deidentified and contains, amongst others, case information, laboratory, medication, monitor and respirator signal data.SICdb provides aggregated once-per-hour and highly granular once-per-minute data.


The dataset, version 1.0.6, includes data from more than 27,386 admissions to the Department of Anesthesiology and Intensive Care Medicine at the General Hospital Salzburg and Paracelsus Medical University.

Data Tables

The SICdb dataset consists of billions of data entries across 7 data tables. The main table, "cases," contains a single entry for each intensive care admission and includes information about the patient (such as age, weight, and sex) and case details (such as diagnosis, scores, and ICD10 codes). The "TimeOfStay" field indicates the time from the first admission to a Metavision-enabled ward to the final closing of the case, including any preceding surgery. The "OffsetOfDeath" field indicates survival in seconds from admission. This data represents in-hospital mortality and may extend beyond the length of the intensive care stay or the current hospital stay. Personal data such as age, weight, and height have been grouped into bins of 5, with ages over 90 placed in the final bin.

All other data tables are related to the "cases" table through the "CaseID" field. Most data has timing information, with the "offset" field indicating the number of seconds from admission to the time of the event. The "laboratory" table contains laboratory values and the "medication" table provides data on administered drugs. There are several generic data tables that contain data sorted by type. The "data_ref" table contains additional nominal/categorical data, one entry per admission, and the "data_range" table documents items with a start and end time, such as data on central lines or drainages. The "data_float_h" table contains float data, aggregated once per hour, and includes most signal data. To reduce table size, minute data is serialized as a stream of IEEE 754 floats in the "data_float_h.rawdata" field. For further instructions on using this data, see the documentation found in Documentation.pdf or online [7], and refer to the unpacking script example provided on our GitHub repository [8].

Reference Table

Nominal data is encoded, and the reference table "d_references" provides additional information about the associated field. The referenced fields in all data tables correspond to the primary key of the "d_references" table, "ReferenceGlobalID." The "ReferenceValue" field in "d_references" gives the variable's value, and the "ReferenceUnit" field holds the unit or measurement, if applicable.

Data Format

GZip-compressed RFC 4180 comma-separated files are provided. The most current documentation, including table schemas, can be found online , and an offline copy is included in the files under the name Documentation.pdf. A GitHub repository has been created to share code, report issues, and discuss the dataset.

Follow the instructions on Get Started documentation to begin!


The Dataset


The Dataset contains, amongst others, following data:


All nominal fields are referenced in table d_references.

The Database

While the raw Dataset in CSV files provides all possibilities to analyse and process data, it may be more convenient to browse in a fully configured and optimized database.

The SICdb Team provides a software environment to quickly select, view and export data from the database in various formats.

RooDataServer is a specific software created to access medical datasets like the SICDB Dataset.

-- > Quick Start <--