This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Import data

A tutorial for importing data from various sources: databases, Parquet, CSV…

Create a dataset

To import data, navigate to the Datasets page from the top menu or from the widget on the home page.

Then, click on the Plus (+) icon on the left side of the screen to create a new dataset.

Choose a name. For this example, we will import the dataset MIMIC-IV demo set.

For more information about the MIMIC database, click here.

Once the set is created, click on the widget corresponding to this set and go to the Code tab on the right side of the screen.

You will see that R code has been automatically generated.

import_dataset function

To import data into LinkR, we use the import_dataset function.

Here are the arguments that this function can take.

Some arguments do not need to be modified, and we will use default values:

  • r, d: These are variables used to communicate information within the application; they should be passed as arguments to be available inside the function.
  • dataset_id: This is the ID of the current dataset. You can replace this argument with %dataset_id%, which will be substituted by the dataset ID.

You will need to modify these arguments:

  • omop_version: This is the version of OMOP for the data you will import. If you specify %omop_version%, the version indicated in the Summary tab will be used.
  • data_source: Indicate here where the data comes from, db if the data comes from a database connection, disk if it is stored locally.
  • data_folder: If you selected disk for the data_source argument, specify the folder containing the data here.
  • con: If you selected db for the data_source argument, specify the database connection variable here.
  • load_tables: By default, all OMOP tables will be loaded from the specified source. If you want to load only some of these tables, specify the tables to import here. For example, load_tables = c('person', 'visit_occurrence', 'visit_detail').

Connecting to a database

Connecting and reading data

You can import data as part of a database connection.

First, configure the connection object con using the DBI library, then use the import_dataset function.

To indicate that we are loading a database, the data_source argument must be set to “db”.

The con argument will take our con object as its value.

# Connection object. We'll go into detail below.
con <- DBI::dbConnect(...)

# Function to load data when the project loads
import_dataset(
    r, d, dataset_id = %dataset_id%, omop_version = "5.4",
    data_source = "db", con = con
)

This code will establish a connection to the database when the project loads.

Let’s now see how to configure the database connection.

PostgreSQL

con <- DBI::dbConnect(
    RPostgres::Postgres(),
    host = "localhost",
    port = 5432,
    dbname = "mimic-iv-demo",
    user = "postgres",
    password = "postgres"
)

DuckDB

You can connect to a DuckDB database via the .db file.

con <- DBI::dbConnect(duckdb::duckdb(), dbdir = "/my_db_file.db", read_only = TRUE)

Complete example

# Connecting to the local PostgreSQL database
con <- DBI::dbConnect(
    RPostgres::Postgres(),
    host = "localhost",
    port = 5432,
    dbname = "mimic-iv-demo",
    user = "postgres",
    password = "postgres"
)

# Loading the data when the project starts
import_dataset(
    r, d, dataset_id = %dataset_id%, omop_version = %omop_version%,
    data_source = "db", con = con
)

Importing files

You can also import files without using a database connection.

To do this:

  • Specify disk for the data_source argument.
  • Specify the location of the files in the data_folder argument.

For example, let’s say the files for my database are in the folder /data/mimic-omop/:

/data/mimic-iv-demo/
--- person.parquet
--- visit_occurrence.parquet
--- visit_detail.parquet
--- measurement.parquet

I load them like this.

import_dataset(
    r, d, dataset_id = %dataset_id%, omop_version = "5.4",
    data_source = "disk", data_folder = "/data/mimic-iv-demo/"
)

Loading specific tables

You can choose to import only certain tables from the database using the load_tables argument.

Simply specify the tables to import in a character vector like this:

# Loading only the person, visit_occurrence, visit_detail, and measurement tables
tables <- c("person", "visit_occurrence", "visit_detail", "measurement")

# Adding the load_tables argument in import_dataset
import_dataset(
    r, d, dataset_id = %dataset_id%, omop_version = "5.4",
    data_source = "db", con = con,
    load_tables = tables
)

From the content catalog

You can also install a dataset from the content library.

This will allow you to download the code needed to load data, but only the code.

The data will not be downloaded: access to health data generally requires authentication.

Find the tutorial here.