Importing Data

How to import data from different sources: databases, Parquet, CSV…

Introduction

It is possible in LinkR to manage multiple data sources (in OMOP format).

These data sources can each be reused in multiple projects.


It is possible to import data from different sources:


  • A relational database (DuckDB, PostgreSQL...)
  • Parquet files
  • CSV files

It will soon be possible to import data in a custom format (data collection) and in FHIR format.

Create a Dataset

To import data, go to the Datasets page from the menu at the top of the screen.

Then click on the Plus (+) icon on the left side of the screen to create a new dataset.

Choose a name. For the example, we will import the MIMIC-III dataset.

For more information about the MIMIC database, go here.

Once the set is created, click on the widget corresponding to this set and go to the Code tab on the right side of the screen.

You will see that R code has been automatically generated.

This code gives you two examples of using the import_dataset function, which we will see in detail.

import_dataset Function

To import data into LinkR, we use the import_dataset function.

Here are the function arguments, which we will use depending on the type of data to import (file or database connection):

  • omop_version: This is the OMOP version of the data you are going to import (“5.3” or “5.4”)
  • data_folder: In the case of importing data from a folder, this is the folder containing the data
  • con: In the case of importing data from a database, this is the database connection object
  • tables_to_load: By default, all OMOP tables will be loaded from the indicated source. If you only want to load some of these tables, specify here the tables to import. For example load_tables = c('person', 'visit_occurrence', 'visit_detail')

Database Connection

You can import data from a connection to a database.

First configure the connection object con with the DBI library, then use the import_dataset function.

The con argument will take our con object as value.

# Connection object. We see this in detail below.
con <- DBI::dbConnect(...)

# Function to load data when loading the project
import_dataset(omop_version = "5.4", con = con)

This code will establish a connection to the database when loading a project using this dataset.

Here is an example with a connection to a local PostgreSQL database.

# Connection to local PostgreSQL database
con <- DBI::dbConnect(
    RPostgres::Postgres(),
    host = "localhost",
    port = 5432,
    dbname = "mimic-iv-demo",
    user = "postgres",
    password = "postgres"
)

# Loading data when launching the project
import_dataset(omop_version = "5.4", con = con)

Import Files

You can also import files without going through a database connection.

For this, specify the file location in the data_folder argument.

For example, let’s say my database files are in the /data/mimic-iv-demo/ folder:

/data/mimic-iv-demo/
--- person.parquet
--- visit_occurrence.parquet
--- visit_detail.parquet
--- measurement.parquet

I load them like this.

import_dataset(omop_version = "5.4", data_folder = "/data/mimic-iv-demo/")

Function Execution

Once your script is configured, you can execute the code with the “Execute” button on the left side of the screen (Play icon), or using the CTRL/CMD + SHIFT + ENTER shortcut.

If the data is correctly imported, you will have the row count per table like this.

In case of error, the error message will be displayed in this field (on the right side of the screen).

So it’s this script that will execute every time you load a project that uses this dataset.

You can create more complex scripts, which will for example download data from an external source, transform it and save it locally. This is for example what we do with the script to download the MIMIC-IV demo database.

Load Specific Tables

You can choose to import only certain tables from the database, with the tables_to_load argument.

You just need to specify in a character vector the tables to import, like this:

# Loading only person, visit_occurrence, visit_detail and measurement tables
tables <- c("person", "visit_occurrence", "visit_detail", "measurement")

# Adding the tables_to_load argument in import_dataset
import_dataset(omop_version = "5.4", con = con, tables_to_load = tables)

From the Content Catalog

You can also install a dataset from the content library.

This will allow you to download the code to load data (and only the code, we never download data from the content catalog).

Find the tutorial here.

Conclusion

We have seen how to import data, from a database or from files.


Let's now see how to use this data within a project.

Last modified 2025.06.22: Update fr/docs/create_subset (4d84c19)