Get Started

From SICDB Doc
Jump to navigation Jump to search

Introduction

SICdb dataset and the documentation, as of 04/24, in active development. We try to improve our project as fast as we can! Please contact us for every problem with data or software you find! We'd love to find some motivated researchers attain as much as possible from our dataset. Contact us if you face any issues!

The SICdb dataset is provided in compressed .csv files, the minute values are even more consolidated. Refer to the File List for detailed description of files.

The SICdb dataset contains billions of entries, therefore building up a database may present a challenge. Therefore a 'as simple as possible' solution is provided. Our solution provides a fully preconfigured and fast environment to access, explore and export SICdb data. Refer to the Quick Start chapter if you know how the commandline and docker is working, skip to the Detailed Instructions for a more detailed reference.

Quick Start

Just like other ICU datasets SICdb is huge. Expect that your pc need a significant amount of time to process!

The database can be built up using our Docker images. After install navigate into the folder containing all the data and run "docker compose up". When the environment is running open http://localhost:5000 to install the dataset. The provided environment ist fully preconfigured, just press start and wait. Due to the vast size this may take 4-12 hours*. When install is finished the server has to be restarted, you may do this by reloading the page and then press the restart button.

  • ) We work on a solution to provide a fully indexed database. Until now we have not found a legally safe way of distribution (repository).

Detailed Instructions (Windows)

The installation will need some time, expect several hours. The process can be interrupted and will continue where it was stopped.

If you need any help with installation please feel free to contact us!

Note: If "docker compose up" throws any errors, open "Docker Desktop" app as administrator, navigate to "Containers" and press the play button in line "roodataenv"

Issues and Troubleshooting on Windows Systems

Generally the environment should run on all common operating systems on all machines (We recommend >=16gb RAM and a modern multicore CPU).

Note: If "docker compose up" throws any errors, open "Docker Desktop" app as administrator, navigate to "Containers" and press the play button in line "roodataenv"

Feel free to contact us if the problem persists.

There are some issues to be expected when the environment is executed on a "standard" windows machine. We recommend using default configuration (WSL2), but there are some known bugs and issues, which have not been resolved as of 11/2022. We have not faced any issues on a linux machine.


  • Installing the database will cause WSL2 to reserve 50% of your machines RAM and it will not be released automatically. It is expected that this will be patched in future, but for now, after installing the dataset, you may either close the container and run "wsl --shutdown" in admin powershell (Docker Desktop will automatically restart wsl) or restart your computer.
  • Due to a known bug in WSL2 it may be not possible to run the environment on a software-mounted drive. This can cause issues when trying to launche SICdb environment from a VeraCrypt/TrueCrypt drive. BitLocker is fully supported as far as we know.
  • Due to a known issue concerning the way windows manages hosts it my be that SICdb environment is not reachable on localhost:5000 or does not start at all. This can be resolved by closing Docker Desktop, run as administrator and then restart engine. The restart engine button is found in the system tray (right bottom on windows machines), right click docker icon->restart.