Import data
Create a dataset
To import data, navigate to the Datasets page from the top menu or from the widget on the home page.
Then, click on the Plus
(+) icon on the left side of the screen to create a new dataset.
Choose a name. For this example, we will import the dataset MIMIC-IV demo set
.
For more information about the MIMIC database, click here.
Once the set is created, click on the widget corresponding to this set and go to the Code
tab on the right side of the screen.
You will see that R code has been automatically generated.
import_dataset
function
To import data into LinkR, we use the import_dataset
function.
Here are the arguments that this function can take.
Some arguments do not need to be modified, and we will use default values:
- r, d: These are variables used to communicate information within the application; they should be passed as arguments to be available inside the function.
- dataset_id: This is the ID of the current dataset. You can replace this argument with
%dataset_id%
, which will be substituted by the dataset ID.
You will need to modify these arguments:
- omop_version: This is the version of OMOP for the data you will import. If you specify
%omop_version%
, the version indicated in theSummary
tab will be used. - data_source: Indicate here where the data comes from,
db
if the data comes from a database connection,disk
if it is stored locally. - data_folder: If you selected
disk
for thedata_source
argument, specify the folder containing the data here. - con: If you selected
db
for thedata_source
argument, specify the database connection variable here. - load_tables: By default, all OMOP tables will be loaded from the specified source. If you want to load only some of these tables, specify the tables to import here. For example,
load_tables = c('person', 'visit_occurrence', 'visit_detail')
.
Connecting to a database
Connecting and reading data
You can import data as part of a database connection.
First, configure the connection object con
using the DBI
library, then use the import_dataset
function.
To indicate that we are loading a database, the data_source
argument must be set to “db”.
The con
argument will take our con
object as its value.
# Connection object. We'll go into detail below.
con <- DBI::dbConnect(...)
# Function to load data when the project loads
import_dataset(
r, d, dataset_id = %dataset_id%, omop_version = "5.4",
data_source = "db", con = con
)
This code will establish a connection to the database when the project loads.
Let’s now see how to configure the database connection.
PostgreSQL
con <- DBI::dbConnect(
RPostgres::Postgres(),
host = "localhost",
port = 5432,
dbname = "mimic-iv-demo",
user = "postgres",
password = "postgres"
)
DuckDB
You can connect to a DuckDB database via the .db file.
con <- DBI::dbConnect(duckdb::duckdb(), dbdir = "/my_db_file.db", read_only = TRUE)
Complete example
# Connecting to the local PostgreSQL database
con <- DBI::dbConnect(
RPostgres::Postgres(),
host = "localhost",
port = 5432,
dbname = "mimic-iv-demo",
user = "postgres",
password = "postgres"
)
# Loading the data when the project starts
import_dataset(
r, d, dataset_id = %dataset_id%, omop_version = %omop_version%,
data_source = "db", con = con
)
Importing files
You can also import files without using a database connection.
To do this:
- Specify
disk
for thedata_source
argument. - Specify the location of the files in the
data_folder
argument.
For example, let’s say the files for my database are in the folder /data/mimic-omop/
:
/data/mimic-iv-demo/
--- person.parquet
--- visit_occurrence.parquet
--- visit_detail.parquet
--- measurement.parquet
I load them like this.
import_dataset(
r, d, dataset_id = %dataset_id%, omop_version = "5.4",
data_source = "disk", data_folder = "/data/mimic-iv-demo/"
)
Loading specific tables
You can choose to import only certain tables from the database using the load_tables
argument.
Simply specify the tables to import in a character vector like this:
# Loading only the person, visit_occurrence, visit_detail, and measurement tables
tables <- c("person", "visit_occurrence", "visit_detail", "measurement")
# Adding the load_tables argument in import_dataset
import_dataset(
r, d, dataset_id = %dataset_id%, omop_version = "5.4",
data_source = "db", con = con,
load_tables = tables
)
From the content catalog
You can also install a dataset from the content library.
This will allow you to download the code needed to load data, but only the code.
The data will not be downloaded: access to health data generally requires authentication.
Find the tutorial here.