Verify that Docker works by opening a terminal (PowerShell or CMD on Windows) and running:
docker --version
Copy the Docker image from Docker Hub.
docker pull interhop/linkr:latest
Launch a container from this image.
docker run -p 3838:3838 interhop/linkr:latest
You can now access LinkR via the address localhost:3838.
You can also launch LinkR by changing the arguments of the run_app function (see next paragraph).
docker run \
-p 3838:3838 \
interhop/linkr:latest \
R -e "linkr::run_app(language = 'en', app_folder = '/root')"
Here are the arguments for run_app:
Argument
Description
language
Application language. Can be "fr" for French or "en" for English. (character)
app_folder
Folder where application files will be stored in the container. (character)
authentication
Enable or disable user authentication (TRUE or FALSE). (logical)
username
Username to use for automatic login when authentication = FALSE. Ignored if authentication is enabled. (character)
local
If TRUE, the application runs in local mode without loading external files (e.g., from GitHub). (logical)
log_level
Log levels to display. Can include "info", "error", "event", or "none" to disable logs. (character vector)
log_target
Log destination: "console" or "app". (character)
port
Port used to run the Shiny application. (integer)
host
Host address to run the application (default "0.0.0.0"). (character)
loading_options
List of startup options (page, project, filter, etc.): can include named elements like page, project_id, load_data_page, subset_id, person_id. (list)
To allow the container to access a specific folder on your host system (for example, /my_personal_folder/linkr), you can mount this folder in the container. This is done with the -v option when launching the container.
docker run \
-p 3838:3838 \
-v /my_personal_folder/linkr:/root \
interhop/linkr:latest \
R -e "linkr::run_app(language = 'en', app_folder = '/root')"
Here we have properly configured the app_folder argument of the run_app function to save the application files in the /root folder, which will actually be the folder on your system that you specified with the -v option.
Via RStudio / R Console
The remotes library will be necessary for LinkR installation. You can install it with this command:
install.packages("remotes")
Stable version
Install the latest stable version with this command:
From the home page or from the menu at the top of the page, go to the “Content Catalog” page.
Find Saint-Malo on the map and select InterHop.
You will see on the right side of the screen the home page of InterHop’s shared content: we can see that plugins and datasets are offered, among other things.
To access the details of this shared content, click on the “View Content” button at the bottom of the page.
We want to download the data from the “MIMIC-IV demo set” dataset.
So we will click on the “Datasets” tab, at the top right of the screen, then click on the widget that corresponds to our dataset.
All that’s left is to click on “Install”, and that’s it, the dataset (at least its code) is installed!
So this catalog allows sharing data?
No, no data is downloaded when installing a dataset.
Health data, which is sensitive data, must be handled within a very specific regulatory framework.
Here, we have downloaded the code that allows access to the data, and not the data itself.
It happens that in our case, this is anonymized data, and therefore not subject to GDPR. This is why this code can, without any authentication, download the data of the 100 patients from the MIMIC-IV test database.
You can read this article for more information about the MIMIC database.
Let’s now see how to import a project.
Import a Project
We will proceed in the same way to import a project.
Click on the “Projects” tab at the top right of the screen, click on the “LinkR Demo” project, then install the project.
Launch the Project
In order to launch our project, we must associate the project and the dataset.
To do this, go to the “Projects” page, from the first icon from the left at the top of the screen or from the home page (which is accessed by clicking on the LinkR icon at the top left of the screen).
Then click on the project to launch it.
Click on the “Data” tab at the top right of the screen.
Select the dataset we previously installed: “MIMIC-IV demo set”.
Click on the “Play” icon to load the data.
The dataset code, being launched for the first time, downloads the CSV files from the database, this may take a few minutes. Loading will be faster the following times, since the files will be stored locally, they will not need to be downloaded again.
You can now access the data:
Either from the project’s “Summary” page, with the “Individual Data” or “Aggregated Data” icons
Or from the icon at the top of the screen
You should see this on the aggregated data screen.
Conclusion
We have seen in this tutorial how to importdata and import a project from the content catalog.
These vocabularies must be imported into LinkR to display the names of the concepts corresponding to concept IDs (the concept_id columns in OMOP tables).
Data imported in the OMOP format often need to be cleaned using data cleaning scripts.
A common example is weight and height data, which often contain outliers due to how clinical software is designed, such as swapped weight and height fields.
Scripts to exclude such outliers are often created. LinkR facilitates sharing these scripts, which, because of the use of the common OMOP data model, can work across different datasets imported into LinkR.
Other script examples include:
Calculating scores, such as APACHE-II or the SOFA score
Calculating urine output by summing different parameters (e.g., urinary catheter, nephrostomy, etc.)
A project is an R and Python environment where data is analyzed.
A project may correspond to a study (e.g., a study on mortality prediction) or to data analysis outside a study, such as creating dashboards (e.g., a dashboard visualizing hospital department activity).
When creating a project, the user selects the data to use from the datasets loaded into the application.
The project will center around two main pages:
Individual data page: here, users can recreate the equivalent of a clinical record by creating tabs configured with widgets. For example:
A “Hemodynamics” tab to configure widgets displaying heart rate, blood pressure, and antihypertensive treatments received by the patient.
A “Notes” tab to display all textual documents related to the patient (e.g., hospital reports, daily clinical notes).
An “Infectiology” tab to display all data related to infectiology (e.g., microbiological samples, antibiotics received).
etc.
Aggregated data page: here, users can similarly create tabs to configure widgets for group analyses, such as:
A “Demographics” tab displaying demographic data for a group of patients (e.g., age, sex, length of stay, mortality).
An “Outlier Data” tab showing distributions of various parameters and excluding outliers.
A “Survival Analysis” tab with a widget configured for population survival analysis.
etc.
Using the low-code interface (which combines a code interface and a graphical interface), collaboration between data scientists, statisticians, and clinicians becomes easier.
Plugins are blocks of R and Python code that add functionalities to LinkR.
As described earlier, projects are structured with tabs.
These tabs contain widgets, which are plugins applied to data.
For example, if I choose the “Timeline” plugin to be applied to the “Heart Rate” parameter, the resulting widget will be a timeline chart displaying the patient’s heart rate.
There are individual data plugins, which allow recreating a medical record. Examples include:
Document viewer: displays textual documents (e.g., hospital reports, clinical notes) and filters them (e.g., keyword search, title-based filters).
Timeline: displays temporal data as a timeline, as described above.
Datatable: displays data in a tabular format, such as lab results by sampling time.
etc.
We also have aggregated data plugins for visualizing and analyzing aggregated data, such as:
ggplot2: a plugin displaying variables using different charts from the ggplot2 library.
Survival analysis: conducts survival analysis.
Machine learning: trains and evaluates machine learning models using R or Python libraries.
A tutorial for importing data from various sources: databases, Parquet, CSV…
Create a dataset
To import data, navigate to the Datasets page from the top menu or from the widget on the home page.
Then, click on the Plus (+) icon on the left side of the screen to create a new dataset.
Choose a name. For this example, we will import the dataset MIMIC-IV demo set.
For more information about the MIMIC database, click here.
Once the set is created, click on the widget corresponding to this set and go to the Code tab on the right side of the screen.
You will see that R code has been automatically generated.
import_dataset function
To import data into LinkR, we use the import_dataset function.
Here are the arguments that this function can take.
Some arguments do not need to be modified, and we will use default values:
r, d: These are variables used to communicate information within the application; they should be passed as arguments to be available inside the function.
dataset_id: This is the ID of the current dataset. You can replace this argument with %dataset_id%, which will be substituted by the dataset ID.
You will need to modify these arguments:
omop_version: This is the version of OMOP for the data you will import. If you specify %omop_version%, the version indicated in the Summary tab will be used.
data_source: Indicate here where the data comes from, db if the data comes from a database connection, disk if it is stored locally.
data_folder: If you selected disk for the data_source argument, specify the folder containing the data here.
con: If you selected db for the data_source argument, specify the database connection variable here.
load_tables: By default, all OMOP tables will be loaded from the specified source. If you want to load only some of these tables, specify the tables to import here. For example, load_tables = c('person', 'visit_occurrence', 'visit_detail').
Connecting to a database
Connecting and reading data
You can import data as part of a database connection.
First, configure the connection object con using the DBI library, then use the import_dataset function.
To indicate that we are loading a database, the data_source argument must be set to “db”.
The con argument will take our con object as its value.
# Connection object. We'll go into detail below.con<-DBI::dbConnect(...)# Function to load data when the project loadsimport_dataset(r,d,dataset_id=%dataset_id%,omop_version="5.4",data_source="db",con=con)
This code will establish a connection to the database when the project loads.
Let’s now see how to configure the database connection.
# Connecting to the local PostgreSQL databasecon<-DBI::dbConnect(RPostgres::Postgres(),host="localhost",port=5432,dbname="mimic-iv-demo",user="postgres",password="postgres")# Loading the data when the project startsimport_dataset(r,d,dataset_id=%dataset_id%,omop_version=%omop_version%,data_source="db",con=con)
Importing files
You can also import files without using a database connection.
To do this:
Specify disk for the data_source argument.
Specify the location of the files in the data_folder argument.
For example, let’s say the files for my database are in the folder /data/mimic-iv-demo/:
You can choose to import only certain tables from the database using the load_tables argument.
Simply specify the tables to import in a character vector like this:
# Loading only the person, visit_occurrence, visit_detail, and measurement tablestables<-c("person","visit_occurrence","visit_detail","measurement")# Adding the load_tables argument in import_datasetimport_dataset(r,d,dataset_id=%dataset_id%,omop_version="5.4",data_source="db",con=con,load_tables=tables)
From the content catalog
You can also install a dataset from the content library.
This will allow you to download the code needed to load data, but only the code.
The data will not be downloaded: access to health data generally requires authentication.
Two types of vocabularies are used in the OMOP common data model:
Standard vocabularies, which are international reference vocabularies. These include:
LOINC for laboratory data and vital signs
SNOMED for diagnoses and procedures
RxNorm for prescriptions
Non-standard vocabularies, which are often international vocabularies but not exclusively. These are widely used, which is why they are included, even though they are non-standard. Examples include:
ICD-10 for diagnoses
CCAM, a French terminology for medical procedures
Both standard and non-standard vocabularies can be used in the OMOP data model. Standard concepts will be found in the _concept_id columns, while non-standard concepts will appear in the _source_concept_id columns. You should aim to use standard concepts as much as possible during the ETL process.
ATHENA
ATHENA is a vocabulary querying platform provided by OHDSI.
It allows you to search for concepts across all OMOP vocabularies using filters.
By clicking on the Download tab at the top of the page, you can download the vocabularies of your choice.
Start by deselecting all vocabularies by clicking on the checkbox at the top left of the screen, then select the vocabularies you wish to download.
For example, we will download the LOINC vocabulary.
Check the LOINC vocabulary box, then click on Download vocabularies at the top right.
Note that some vocabularies are not public and require a license to download.
Choose a name for the bundle, then click on Download. The site will indicate that the bundle is being created, which can take a few minutes as the server processes the SQL query to generate the CSV files and ZIP file.
Next, click on “Show history” and then “Download” to retrieve your bundle.
You will download a ZIP file containing one CSV file per vocabulary table (“VOCABULARY.csv,” “CONCEPT.csv,” etc.).
Importing vocabularies into LinkR
Now, all that remains is to import the vocabulary into LinkR.
To do so, go to the “Vocabularies” page, accessible from the homepage or via the link at the top of the page.
Then click on the “Import concepts or vocabularies” button in the sidebar.
Select either the ZIP file or the individual CSV files.
Click on “Import.” Done! We have successfully imported LOINC into LinkR.
Querying vocabularies in LinkR
Navigate to the application’s database page via the tab at the top right of the screen.
Go to the “Query the database” tab at the top right of the screen and select the option.
At the top right of the screen, select “Public DB.”
You can query the concept tables using SQL:
6 - Create a project
Now that everything is set up, we can create a project
As we have seen previously, a project in LinkR is an R and Python environment where data is analyzed.
A project can correspond to a study (e.g., a study on mortality prediction), but it can also be used for data analysis outside studies, such as creating dashboards (e.g., a dashboard to visualize the activity of a hospital department).
Create a project
To get started, navigate to the project page, either from the top menu or the homepage.
Then, click “Create a project.”
Choose a name for your project and click “Add.”
Click on the project to open it.
You will land on the homepage of your project.
This homepage is divided into several tabs:
Summary: displays the main information related to the project, such as the author(s), project description, and an overview of the loaded data
Data: provides details about the data loaded in this project, such as the number of patients, visits, and rows per OMOP table
Data cleaning: where you configure the data cleaning scripts that will be applied to the data upon project load
Sharing: allows you to update a Git repository to share your project with the community
Note that the project name appears at the top of the screen. If you navigate to another project page (e.g., Individual data page) and click on the name, you will return to the project homepage.
To the right of the project name, several buttons appear:
Individual Data: access the page where you can configure data to create a patient file
Aggregate Data: access the page where cohort data can be visualized and analyzed
Concepts: search for concepts present in the imported dataset
Subsets: create patient subsets by filtering them based on specific criteria
Configure the project
Now that the project is created, let’s configure it.
First, we need to specify which dataset will be loaded when the project starts.
To do this, go back to the “Data” tab of your project.
Currently, no dataset is associated with the project.
If not already done, install the MIMIC-IV demo dataset from our Git repository by following this tutorial. You can find the InterHop Git repository by clicking its icon located on the map in Saint-Malo, France.
Select the dataset “MIMIC-IV demo set” from the dropdown menu, then click the “Load Data” button to the right of the dropdown.
The loaded data information will now be updated: 100 patients corresponding to 852 visits have been loaded.
Click on the patient count to display information about them.
Explore concepts
Once the data is loaded, we can see what it is composed of.
Each piece of information in an OMOP database is encoded using a concept, which belongs to a vocabulary. Each concept has a unique identifier that you can find via the ATHENA tool.
Concepts are stored in the _concept_id columns of various OMOP tables. To retrieve the mapping of each concept ID, you must import the necessary vocabularies into LinkR.
Once this is done, go to the Concepts page of the project via the icon to the right of the project name at the top of the screen.
You will arrive at this page. Select a vocabulary from the dropdown menu to load its concepts.
To obtain data in the OMOP format, you need to perform an ETL (Extract, Transform, and Load) process.
During this process, the data is transformed to fit the OMOP data model, and local concepts are aligned with ATHENA standard concepts. For example, a hospital’s heart rate code will be aligned with the standard “Heart rate” concept from the LOINC vocabulary.
This alignment process is lengthy and complex, as thousands of codes often need to be aligned manually.
This is why most OMOP datasets only partially align their concepts. This is why you see certain standard vocabularies (LOINC, SNOMED) in the dropdown menu above and other local ones (prefixed by mimiciv).
If you haven’t yet imported the vocabularies, you must refresh the count of concepts by clicking the “Refresh count” icon at the top left of the screen.
Similarly, if you change the dataset associated with the project, you must refresh the concept count.
By selecting a vocabulary, you will see the different concepts from that vocabulary used in the dataset loaded for your project.
You will see the number of patients who have at least one instance of the concept in the “Patients” column and the number of rows across all OMOP tables associated with the concept in the “Rows” column.
When you click on a concept in the table, information related to that concept will appear on the right side of the screen.
Notably, you can retrieve the concept ID, which will be useful when querying OMOP tables. You can also view the distribution of the concept’s values in the loaded dataset.
You can filter concepts by their name using the menu at the top of the “Concept Name” column. You can also choose which table columns to display. These columns are from the OMOP CONCEPT table.
Create tabs and widgets
Now that we’ve loaded a dataset and explored its concepts, we can visualize and analyze this data using widgets.
To do so, go to the Individual data page, either from the project summary tab or from the icon at the top right of the screen, next to the project title (the one with a single individual).
You will arrive at the Individual data page, where you can recreate a patient record according to your project’s needs.
The menu on the left side of the screen allows you to:
Add tabs: tabs allow you to organize the various widgets
Add widgets: widgets are the building blocks of projects. They enable data visualization and analysis using plugins.
Edit the page: once widgets are created, you can rearrange them on the page. You can also modify or delete tabs.
Select patients: each subset contains multiple patients, and each patient has one or more visits (hospital stays or consultations)
It’s up to you to decide how to organize your project.
For the Individual Data page, it’s common to create a tab for each topic, such as a “Haemodynamics” tab for patient haemodynamic data or an “Infectiology” tab to display elements related to infectious issues: antibiotic treatments, microbiological samples, etc.
Let’s create a first tab, “Haemodynamics.” Click the “+ Tab” button on the left side of the screen and choose a name.
You will have a new empty tab. Tabs are displayed on the right side of the screen.
Now we can add different widgets to this tab. Click the “+ Widget” button on the left side of the screen.
You will need to:
choose a name
choose a plugin
choose concepts
A plugin is a script written in R and/or Python that adds functionalities to the application.
There are plugins specific to individual data, others for aggregated data, and some that are mixed.
Each plugin has a main functionality.
Some plugins are used to visualize a type of data, such as the plugin that allows you to visualize prescription data in a timeline format or the plugin that displays structured data in a table format.
Others are used to analyze data, such as the plugin that creates a logistic regression model or the one that trains machine learning models.
Every step of a data science project can be transformed into a plugin, saving time and improving data analysis quality. LinkR aims to offer more and more plugins thanks to its community’s contributions.
For example, we want to display the hemodynamic parameters of patients in a timeline format.
We will click on “Select a plugin,” then select the “Timeline {dygraphs}” plugin, which displays data as a timeline using the JavaScript library dygraphs.
If the plugin does not appear in the list, download it from the Content Catalog.
Now let’s select which concepts to display by clicking on “Select concepts.”
For example, we selected the concepts of heart rate and systolic, diastolic, and mean arterial pressures using the LOINC terminology.
Let’s choose a name, for example, “Haemodynamic Timeline,” and click “Add.” Our widget will appear on the page.
A widget will often appear in the same form, with three or four icons at the top of the widget, two buttons on the left, and the name of the settings file.
Let’s start with the menu at the top of the widget.
The icons are, from left to right:
Figure: displays the figure or the result the plugin is meant to show
Figure settings: configures the figure using a graphical interface
Figure code: edits the R or Python code that displays the figure
General settings: general widget settings, such as showing or hiding certain elements
Each widget works the same way: a graphical interface configures the figure. When parameters are modified, the corresponding R or Python code can be generated. Once this code is generated, it can be edited directly in the code editor, allowing you to go beyond what the graphical interface alone offers.
Widgets work with settings files, saving both figure parameters and figure code. This allows you to create multiple configurations for the same widget.
To select a settings file, click on the file name (here “No settings file selected”), then select the file from the dropdown menu.
To create a settings file, click the “+” icon on the same page, choose a name, and create the file. For this first example, we will name it “Haemodynamic set 1”
Once the file is created, the parameters saved on the “Figure settings” and “Figure code” pages will be saved in this file.
Before configuring our figure, let’s review the widget’s general settings.
In the Display section, we can choose to show or hide the selected settings file.
We can also choose to display the parameters or editor side-by-side with the figure. This divides the widget screen into two parts: the figure on the left and the parameters or code on the right, useful for quickly seeing the result of our parameters.
In the Code Execution section, we can choose to execute the code when loading a settings file: for example, when loading a project, the last selected settings file will be loaded, initializing all widgets on project load. You can also choose not to load a widget if it might take time to execute and is not immediately needed.
The option “Execute code when updating data” updates the figure when the patient changes if this widget uses patient-specific data.
We will choose to hide the settings file, display parameters or the editor side-by-side with the figure, and execute the code both when loading the settings file and when updating data.
The settings file name disappears, as does the figure icon: the figure will display in the “Figure settings” and “Figure code” tabs.
Don’t forget to save your general settings using the icon on the left of the widget. Widget general settings depend on the widget, not the settings file.
Before displaying our data, let’s adjust one last detail: make the widget larger.
To do this, click “Edit page” on the left of the screen. New icons will appear at the top right of the widget:
an icon to toggle the widget to full screen, useful during the widget configuration phase
an icon to modify the widget, such as changing the name or adding/removing concepts
an icon to delete the widget
There are also icons at all four corners to resize the widget.
Resize the widget to take up the full width of the screen and one-third of its height.
Then switch it to full-screen mode. Click “Validate changes” on the left of the screen to exit “Edit” mode.
Go to the “Figure settings” tab to configure our figure.
For this plugin, we have three options:
Data to display: should we display data for the selected patient or only for the selected stay?
Concepts: which concepts should we display? Here, we see the concepts we selected when creating the widget. You can choose to display only some of them.
Synchronize timelines: this can be useful for synchronizing different widgets.
Select “Patient data” in “Data to display,” then “Heart rate” from the concepts dropdown menu.
Click the “Save” icon on the left of the widget, then the “Display figure” icon (Play icon).
You will be prompted to select a patient: we hadn’t chosen one yet.
Start by selecting “All patients” from the “Subset” dropdown menu, then select any patient.
Since we selected to update the code when changing patients, you should now see the selected patient’s heart rate displayed as a timeline.
Click “Edit page” again, then exit full-screen mode. Your widget should return to the dimensions you assigned: one-third of the page height and full width, ideal for this timeline.
You can zoom in on the figure and change the selected time interval.
Your turn!
Try now to:
Create a new settings file for the current widget, such as "Hemodynamic Set 2"
Configure the widget to display heart rate and systolic, diastolic, and mean arterial pressures
Create a new widget with the "Data table" plugin to display the same concepts
Synchronize the timelines of both widgets
You should achieve something like this (example taken from the “LinkR Demo” project, which you can download from InterHop’s Content Catalog):
We have seen how to create tabs and widgets to build a patient record on the “Individual data” page.
The same principle applies to the “Aggregated data” page, except tabs generally correspond to steps in a research project, such as a widget for creating the study outcome, a widget for excluding outlier data, or a widget for training machine learning models.
Sharing the project
Once your project is configured, you can share it by integrating it into your Git repository directly from the application.
Go to the “Share” tab from the project’s main page (by clicking the project name in blue at the top of the page).
We have seen how to create a project using plugins to create widgets.
These widgets always include a tab for displaying or manipulating data through a graphical interface and a tab to modify the code behind the displayed result.
Modifying the code through the widget editor allows you to go beyond what the graphical interface offers. However, if you want to display the data in a different format (e.g., a data table instead of a ggplot2 figure), you are limited.
For this, there are two solutions:
Use the Console plugin, a generic plugin that can display data in the desired format, such as a data table, a Plotly figure, or even a web interface generated with Shiny.
Use the Console page, which is more accessible and allows you to easily test code snippets with loaded data.
The Console page is accessible from any page of the application by clicking the icon at the top of the screen.
Similar to the Console plugin, you can choose the programming language and output.
In addition to using R and Python, the Shell is also accessible, allowing you to display files and directories using commands like ls, which is useful when working with a Docker container.
This console also aids in programming LinkR by making various operational variables of the application available. These variables are prefixed with r$.
For example, r$users displays the variable containing the users.
Access to this console can be restricted from the user management page.
Access to r$... variables is not available in the Console plugin.
In both the Console plugin and the Console page, shortcuts are available:
Ctrl|CMD + Shift + C: comment or uncomment the selected code.
Ctrl/CMD + Enter: execute the selected code (executes all code if none is selected).
Ctrl/CMD + Shift + Enter: execute all code.
Data variables
The main advantage of this console, whether via the Console plugin or the Console page, is the ability to manipulate OMOP tables from a dataset.
To load data, you can:
Either load them from the “Dataset” page by selecting a dataset, navigating to the “Code” tab, and clicking “Run Code.”
Or load a project associated with a dataset.
Once the data is loaded, they become accessible via variables prefixed by d$ (d stands for data).
As you can see here, all tables are loaded lazily (indicated by question marks instead of the number of rows in the dataframe), meaning they are not loaded into memory.
This conserves resources and allows you to filter data before loading it into memory using dplyr::collect().
In the following example, we filter the data from the Measurement table for patient 13589912 before collecting it into memory.
The tables available in d$ are the complete tables, including all data from the loaded dataset.
Subsets of this data exist depending on the selected elements:
d$data_subset: Contains all tables for the patients in the selected subset.
d$data_person: Contains data for the selected patient.
d$data_visit_detail: Contains data for the selected visit.
Each of these variables will include OMOP tables, such as d$data_person$measurement, except for tables where this would not make sense (e.g., there is no d$data_person$person table, as the d$person table lists all patients).
For example, if in the currently open project, I selected the same patient as before (13589912), I would retrieve with d$data_person$measurement the same data as earlier when I filtered the global variable d$measurement for this patient.
To retrieve the selected elements, I can use variables prefixed by m$:
m$selected_subset: Currently selected subset.
m$selected_person: Selected patient.
m$selected_visit_detail: Selected visit.
The concepts from OMOP terminologies are available in the d$concept variable.
You can use the join_concepts function to facilitate joins between variables.
For interoperability, it is necessary to query OMOP tables in SQL.
When you import data into LinkR, it always involves a database connection.
Indeed, either you use the “db” value for the data_source argument, in which case you provide the con object directly, representing the connection to the OMOP database, or you use the “disk” value for the data_source argument. In this case, whether your data is in Parquet or CSV format, they are loaded by creating a DuckDB database.
Thus, as soon as data is loaded into LinkR, a d$con connection object is created, allowing you to query your data in SQL.
The following code displays all data from the person table:
DBI::dbGetQuery(d$con,"SELECT * FROM person")%>%tibble::as_tibble()
This query retrieves the age of patients:
sql<-"
SELECT
v.visit_occurrence_id,
v.person_id,
ROUND(
EXTRACT(EPOCH FROM (
CAST(v.visit_start_datetime AS TIMESTAMP) -
CAST(p.birth_datetime AS TIMESTAMP)
)) / (365.25 * 86400),
1
) AS age
FROM
visit_occurrence v
LEFT JOIN
(SELECT person_id, birth_datetime FROM person) p
ON
v.person_id = p.person_id;
"DBI::dbGetQuery(d$con,sql)%>%tibble::as_tibble()
Joining with the CONCEPT table
It is currently not possible to perform joins in SQL, as the concepts are stored in a different database from the data.
This will be resolved as soon as possible by adding a CONCEPT table to the loaded database, containing only the concepts used by the loaded dataset (#135).
8 - Create a subset
Analyze population subsets
Introduction
During a project, it is often necessary to work on a subset of the patient population within a dataset.
For example, it might be useful to create a subset “Included patients” containing only the patients ultimately included in the final analyses of a study.
Similarly, one could imagine creating a subset of patients with a certain diagnosis, included within a specific period, or exposed to a particular treatment.
All of this is possible through the creation of subsets.
Creating a Subset
To create a subset, go to the Subsets page. For this, you need to have loaded a project. Then click on the “Subsets” icon in the top menu, to the right of the loaded project name.
You will arrive on the project subsets page.
A subset is a subset of a dataset, but it depends on a project. If two projects use the same dataset, they will not share the same subsets.
A subset “All Patients” is created by default when a project is created.
To create a subset, click the “+” icon on the left side of the screen.
Choose a name, then click “Add.” For this example, we will create a subset containing patients aged over 50 years.
Click on the subset you just created: you will be taken to the page of the selected subset.
On the right side of the screen, there are two tabs:
Summary: displays the subset’s information, which can be modified (including the subset’s description).
Code: this tab allows you to modify the code and add or remove patients from a subset, which we will explore in the following sections.
Adding patients to a subset
To add patients to a subset, use the add_patients_to_subset function.
This function takes the following arguments:
patients: A numeric vector containing the IDs of the patients to add.
subset_id: The ID of the subset to which the patients will be added (replaced by %subset_id% in the subset code, which is then replaced with the ID of the selected subset).
output, r, m, i18n, and ns: Arguments needed for data manipulation and error message display.
When a subset is created, code is automatically generated to add all patients to the subset.
This code will execute when the user clicks the button to run the code or if the subset is selected from the project (and it does not already contain patients).
We will modify this code to add patients aged over 50.
Let’s create the code to create a column with the patients’ ages.
The code editor of the selected subset allows you to test the code. We will extract the IDs of patients aged over 50. For now, we will comment out the add_patients_to_subset function.
Our code works, so we can store these IDs in a variable and then integrate them into the add_patients_to_subset function.
A message confirms that the patients have been successfully added to the subset.
Graphical Interface
A graphical interface does not yet exist, which would be very useful for filtering patients based on certain characteristics (age, gender, length of stay, hospitalization dates, or the presence of concepts such as diagnoses or treatments).
This graphical interface will be developed in the next version.
Removing patients from a subset
To remove patients from a subset, use the remove_patients_from_subset function, which works like add_patients_to_subset with the same arguments, particularly patients and subset_id.
For example, after adding all patients to the subset, you could remove those aged 50 or younger.
How to create and apply data cleaning scripts to ensure data quality
This feature will be available in the version 0.4.
10 - Create a plugin
How to create plugins to add functionalities to LinkR
Plugins are what allow you to visualize and analyze data using a low-code interface.
They are scripts written in R and Python that leverage the Shiny library to create the graphical interface.
Plugins enable the creation of any visualization or analysis on the data, as long as it is feasible in R or Python.
LinkR continues to evolve through plugins created by its community of users.
We will first look at how to create a simple plugin and then explore how to develop more complex plugins using the development template provided by InterHop.
10.1 - Create a simple plugin
Creating a simple plugin to display data as a histogram
Plugin specifications
We will create a graphical interface to visualize the distribution of a variable in the form of a histogram.
We need to make a first choice: is this a plugin for individual data (patient by patient) or aggregated data (across a group of patients)?
It is more common to want to visualize the distribution of a variable across a group of patients rather than for a single patient. Therefore, we will create a plugin for aggregated data.
Next, what should our graphical interface look like?
We will divide the screen into two sections: on the left, we will display the histogram, and on the right, we will configure the figure’s parameters. This will include a dropdown menu to select the variable and a field to set the number of bins in the histogram.
On the server side, now.
A histogram is not suitable for visualizing all types of data: it can display the distribution of numerical data and categorical data, provided the number of categories is not too large.
To simplify, we will only allow the display of numerical data. Thus, we will restrict the display to the variable d$measurement. Refer to the console documentation for more details on OMOP data variables.
When we change the histogram’s number of bins, the updates should only apply after validation, to avoid unnecessary calculations. We will also need to set bounds for possible values.
Let’s summarize the specifications of our plugin:
UI:
Histogram visualization on the left side of the screen
Parameters on the right side of the screen:
Variable to display
Number of bins in the histogram, with upper and lower bounds
Validation of changes
Server:
Only allow data from the d$measurement variable
Adjust the number of bins in the histogram based on the input value
Trigger the figure code execution after the validation button is clicked
Create the plugin
Navigate to the plugins page from the top menu.
To create a plugin, click the “+” icon on the left side of the screen.
Choose a name, such as “Histogram.”
Select the type of data: the plugin can be used for individual data (patient by patient), aggregated data (a group of patients), or both. For this example, we will choose “Aggregated Data.”
It is also possible to copy an existing plugin: we will see this in the next section when we create a plugin using the InterHop template.
Once the plugin is created, select it. You will arrive at the plugin summary page.
In the top-right corner, you will see that a plugin is divided into four tabs:
Summary: displays general information and the plugin description. This will be detailed in the last section: “Share the Plugin”.
Code: where we edit the scripts to create the plugin’s frontend and backend (see the next three sections)
Test: this tab allows you to test the plugin code with data
Share: this is where you can add the plugin to your Git repository to share it with the community
Plugin structure
Go to the Code tab.
By default, a plugin consists of these three files:
ui.R: contains the Shiny code for the user interface, detailed in the next section
server.R: contains the application’s backend, detailed in the “Server / backend” section
translations.csv: contains translations for the frontend and backend
UI - user interface / frontend
As seen in the diagram above, we want to split the plugin screen into two sections: the figure on the left and the figure parameters on the right.
Start by clicking on the ui.R file on the left side of the screen.
All of our user interface code should be placed inside a tagList function, which combines HTML tags using the R library Shiny.
To place two div elements side by side, they must be wrapped in a div with the attribute style = "display:flex;".
tagList(div(div(# Each id is wrapped in the ns function and includes a %widget_id% tagid=ns("split_layout_left_%widget_id%"),style="margin:10px 5px; width:50%; border:dashed 1px;"),div(id=ns("split_layout_right_%widget_id%"),style="margin:10px 5px; width:50%; border:dashed 1px;"),style="display:flex; height: 100%;",# Displays the two div elements side by side))
Note that whenever an ID is assigned to an HTML element, it must include a %widget_id% tag. This will be replaced with the widget’s ID, ensuring unique IDs. Without unique IDs, issues arise if the same plugin is launched in two different widgets. In the case of duplicate IDs, the HTML page will fail to render.
Additionally, each ID is wrapped in the ns function (see the chapter on Shiny modules in the Mastering Shiny book for more information).
Here, we’ve added borders to our div elements using border:dashed 1px; to visualize the divs, which are currently empty. We will remove these attributes later.
Click on the “Run plugin code” icon on the left side of the screen.
You will automatically be redirected to the “Test” tab, and you should see the following result.
We can clearly see the two div blocks side by side, each with a dashed border.
Now let’s add our histogram.
To do this, we use the plotOutput function, which we will modify on the server side to display our plot.
div(id=ns("split_layout_left_%widget_id%"),plotOutput(ns("plot_%widget_id%")),# Always wrap IDs in ns() and include a %widget_id% attributestyle="margin:10px 5px; width:50%; border:dashed 1px;")
Now let’s create the configuration for our figure, in the right-hand div.
As mentioned earlier, we want three elements:
a dropdown menu to select the variable to display.
a numeric input to set the number of bins in the histogram.
a button to display the figure with these parameters.
We will use the shiny.fluent library, which is used for the entire user interface of LinkR and is based on Fluent UI.
Here are the functions we will use for our three elements:
In plugins, you must prefix all functions with the library name. For example: shiny.fluent::Dropdown.shinyInput().
Let’s create the code to display the configuration elements for the figure.
div(# ID with ns and %widget_id%id=ns("split_layout_right_%widget_id%"),# div containing the title, in bold (strong), with a 10 px space between the title and the dropdowndiv(strong(i18np$t("concept")),style="margin-bottom:10px;"),# Dropdown menu with the conceptsdiv(shiny.fluent::Dropdown.shinyInput(ns("concept_%widget_id%")),style="width:300px;"),br(),# Numeric input to select the number of bins in the histogram# With a value of 50, a minimum of 10, and a maximum of 100div(strong(i18np$t("num_bins")),style="margin-bottom:10px;"),div(shiny.fluent::SpinButton.shinyInput(ns("num_bins_%widget_id%"),value=50,min=10,max=100),style="width:300px;"),br(),# Button to display the figureshiny.fluent::PrimaryButton.shinyInput(ns("show_plot_%widget_id%"),i18np$t("show_plot")),style="margin: 10px 5px; width:50%;")
Click on “Run plugin code” again, and you should see the following result.
You may notice that the titles of the inputs are wrapped in the i18np$t function. This allows the elements to be translated based on the translations.csv file, which we will explore in the next section.
Translations
Translations should be added to the translations.csv file.
This file contains the following columns:
base: this is the keyword you will use in your code, which will be translated based on the selected language
en: the translation of the word into English
fr: the translation into French. Currently, only English and French are supported. Additional languages may be added in the future.
Click on the translations.csv file, and update it with the following translations.
base,en,fr
concept,Concept to show,Concept à afficher
num_bins,Number of bins,Nombre de barres
show_plot,Show plot,Afficher la figure
Run the code again. You should see the following result.
The keywords have been replaced with their French translations.
Now we will make everything dynamic by coding the backend!
Server / backend
Without the backend, the graphical interface is static, and nothing happens when you click on the buttons.
As we saw in the documentation for creating widgets, when we create a widget, we select the plugin to use as well as the concepts.
The selected concepts will be stored in the selected_concepts variable, which includes the following columns:
concept_id: the ID of the concept, either standard (found on Athena) or non-standard (in this case, greater than 2000000000 / 2B)
concept_name: the name of the concept
domain_id: the name of the OMOP Domain, often corresponding to the OMOP table (e.g., the ‘Measurement’ domain for the d$measurement variable)
vocabulary_id: the name of the terminology corresponding to the concept
mapped_to_concept_id & merge_mapped_concepts: not used in the current version of LinkR
To test a plugin and enable the backend to work, a project containing data must be loaded. For example, launch the project used for the quick start.
Next, to simulate the creation of a widget, we will select concepts to test our plugin.
On the left side of the screen, click the “Select concepts” button.
This will open the same menu used to select concepts when creating a widget.
For this example, select the Heart rate concept from the LOINC terminology, then click “Validate.”
Let’s test it: open the server.R file and copy the following code:
print(selected_concepts)
Rerun the plugin code, and you should see the following result.
You can see the backend output appear at the bottom of the screen. This only happens during plugin testing, which helps facilitate debugging. This output is hidden when plugins are used within projects.
The concept Heart rate is displayed along with its concept_id.
Now, let’s write the code to update the dropdown menu for concepts.
# Adding a row with the values 0 / "none"concepts<-tibble::tibble(concept_id=0L,concept_name=i18np$t("none"))%>%dplyr::bind_rows(selected_concepts%>%dplyr::select(concept_id,concept_name))# Converting concepts to a list formatconcepts<-convert_tibble_to_list(concepts,key_col="concept_id",text_col="concept_name")# Adding a delay to ensure the dropdown updates after it is createdshinyjs::delay(500,shiny.fluent::updateDropdown.shinyInput(session,"concept_%widget_id%",options=concepts,value=0L))
Update translations.csv to add the translation for none.
base,en,fr
concept,Concept to show,Concept à afficher
num_bins,Number of bins,Nombre de barres
show_plot,Show plot,Afficher la figure
none,None,Aucun
Several things to note:
We add a row with an empty concept, ’none’, which will help prevent errors if the dropdown menu is empty.
We use the convert_tibble_to_list function, which converts a tibble into a list, necessary for integration into a shiny.fluent input. The arguments are key_col for the column containing the concept code (‘concept_id’) and text_col for the column containing the text (‘concept_name’).
We add a 500 ms execution delay for the update using shinyjs::delay(). This ensures the dropdown is created in the UI before being updated.
Run this code, and you should now have a dropdown menu with the concepts we selected (in this case, Heart rate).
Now, all that’s left is to display our figure.
We will use the observeEvent function, which triggers code upon detecting an event.
observeEvent(input$show_plot_%widget_id%,{# The code inside this function will execute# every time the button with the id 'show_plot_%widget_id%' is clicked# (i.e., the "Show plot" button)})
Important
Always add the %req% tag at the beginning of an observeEvent.
This tag will be replaced with code that ensures previous observers are invalidated when the widget is updated.
When editing plugins, every time you click “Run plugin,” previously created observers will be invalidated, preventing conflicts.
Here are the steps for our code:
Retrieve the selected concept from the dropdown menu
Ensure the concept belongs to a domain that can be displayed as a histogram. For simplicity, we will only allow the ‘Measurement’ domain.
Ensure the tibble of data filtered by the selected concept is not empty
Create the code for the histogram using ggplot
Update the output
observeEvent(input$show_plot_%widget_id%,{# Always add this tag at the start of an observer%req%# Protect the code in case of an error with a tryCatchtryCatch({# 1) Retrieve the selected concept from the dropdown menuselected_concept<-selected_concepts%>%dplyr::filter(concept_id==input$concept_%widget_id%)no_data_available<-TRUE# 2) Check if a concept is selected and if the domain_id equals 'Measurement'if(nrow(selected_concept)>0&&selected_concept$domain_id=="Measurement"){# 3) Ensure the tibble of data filtered by this concept is not emptydata<-d$measurement%>%dplyr::filter(measurement_concept_id==selected_concept$concept_id)if(data%>%dplyr::count()%>%dplyr::pull()>0){# 4) Create the histogram codeplot<-data%>%ggplot2::ggplot(ggplot2::aes(x=value_as_number))+# Use the number of bins from our input$num_bins_%widget_id%ggplot2::geom_histogram(colour="white",fill="#377EB8",bins=input$num_bins_%widget_id%)+ggplot2::theme_minimal()+# Modify the X and Y axis labelsggplot2::labs(x=selected_concept$concept_name,y=i18np$t("occurrences"))no_data_available<-FALSE}}# Display an empty graph if no data is availableif(no_data_available){plot<-ggplot2::ggplot()+ggplot2::theme_void()+ggplot2::labs(title=i18np$t("no_data_available"))}# 5) Update the outputoutput$plot_%widget_id%<-renderPlot(plot)# Error messages will appear in the R console},error=function(e)cat(paste0("\n",now()," - ",toString(e))))})
Update translations.
base,en,fr
concept,Concept to show,Concept à afficher
num_bins,Number of bins,Nombre de barres
show_plot,Show plot,Afficher la figure
none,None,Aucun
occurrences,Occurrences,Occurences
no_data_available,No data available,Pas de données disponibles
You should see the following result.
We are now visualizing the distribution of heart rate across all patients using the d$measurement variable.
Here are the three complete files:
tagList(div(div(id=ns("split_layout_left_%widget_id%"),plotOutput(ns("plot_%widget_id%")),# Always wrap IDs in ns() and include a %widget_id% attributestyle="margin:10px 5px; width:50%; border:dashed 1px;"),div(# ID with ns and %widget_id%id=ns("split_layout_right_%widget_id%"),# div containing the title, in bold (strong), with a 10 px space between the title and the dropdowndiv(strong(i18np$t("concept")),style="margin-bottom:10px;"),# Dropdown menu with the conceptsdiv(shiny.fluent::Dropdown.shinyInput(ns("concept_%widget_id%")),style="width:300px;"),br(),# Numeric input to select the number of bars in the histogram# With a value of 50, a minimum of 10, and a maximum of 100div(strong(i18np$t("num_bins")),style="margin-bottom:10px;"),div(shiny.fluent::SpinButton.shinyInput(ns("num_bins_%widget_id%"),value=50,min=10,max=100),style="width:300px;"),br(),# Button to display the figureshiny.fluent::PrimaryButton.shinyInput(ns("show_plot_%widget_id%"),i18np$t("show_plot")),style="margin: 10px 5px; width:50%;"),style="display:flex; height: 100%;",# Displays the two div elements side by side))
# Adding a row with the values 0 / "none"concepts<-tibble::tibble(concept_id=0L,concept_name=i18np$t("none"))%>%dplyr::bind_rows(selected_concepts%>%dplyr::select(concept_id,concept_name))# Converting concepts to a list formatconcepts<-convert_tibble_to_list(concepts,key_col="concept_id",text_col="concept_name")# Adding a delay to ensure the dropdown updates after it is createdshinyjs::delay(500,shiny.fluent::updateDropdown.shinyInput(session,"concept_%widget_id%",options=concepts,value=0L))observeEvent(input$show_plot_%widget_id%,{# Always add this tag at the start of an observer%req%# Protect the code in case of an error with a tryCatchtryCatch({# 1) Retrieve the selected concept from the dropdown menuselected_concept<-selected_concepts%>%dplyr::filter(concept_id==input$concept_%widget_id%)no_data_available<-TRUE# 2) Check if a concept is selected and if the domain_id equals 'Measurement'if(nrow(selected_concept)>0&&selected_concept$domain_id=="Measurement"){# 3) Ensure the tibble of data filtered by this concept is not emptydata<-d$measurement%>%dplyr::filter(measurement_concept_id==selected_concept$concept_id)if(data%>%dplyr::count()%>%dplyr::pull()>0){# 4) Create the histogram codeplot<-data%>%ggplot2::ggplot(ggplot2::aes(x=value_as_number))+# Use the number of bins from our input$num_bins_%widget_id%ggplot2::geom_histogram(colour="white",fill="#377EB8",bins=input$num_bins_%widget_id%)+ggplot2::theme_minimal()+# Modify the X and Y axis labelsggplot2::labs(x=selected_concept$concept_name,y=i18np$t("occurrences"))no_data_available<-FALSE}}# Display an empty graph if no data is availableif(no_data_available){plot<-ggplot2::ggplot()+ggplot2::theme_void()+ggplot2::labs(title=i18np$t("no_data_available"))}# 5) Update the outputoutput$plot_%widget_id%<-renderPlot(plot)# Error messages will appear in the R console},error=function(e)cat(paste0("\n",now()," - ",toString(e))))})
base,en,fr
concept,Concept to show,Concept à afficher
num_bins,Number of bins,Nombre de barres
show_plot,Show plot,Afficher la figure
none,None,Aucun
occurrences,Occurrences,Occurences
no_data_available,No data available,Pas de données disponibles
Congratulations! You’ve just created your first plugin. You can now use it in a project and, most importantly, improve it.
The advantage of plugins is that anything possible in R or Python can be integrated into LinkR as a plugin.
This process can be challenging and requires knowledge of the Shiny library. To learn more about Shiny, we recommend the excellent book Mastering Shiny.
Plugins can quickly become complex, which is why we’ve created a development template to provide a solid and consistent base for coding more advanced plugins. We’ll explore this in the next chapter.
Share the plugin
Before sharing your plugin, it’s essential to document it so that users know what it does and how to use it.
To do this, go to the Summary page of your plugin. You’ll notice that the “Short Description” field on the left and the “Description” field on the right are empty.
Click the “Edit Information” button on the left side of the screen.
You can now edit the plugin information, including the authors who contributed to its creation and a short description that will appear on the plugin page.
For example, we might provide the following short description for our plugin: “A plugin for visualizing structured data as a histogram”.
You can also edit the Full description by clicking the icon in the top right of the screen.
This will open an editor where you can write the description in Markdown format.
Once you’ve finalized the description, click the “Save updates” icon on the right side of the screen.
To confirm changes to the plugin information, click the “Save” icon on the left side of the screen.
Now that your plugin information is complete, you can share it via the “Share” tab at the top right of the screen, following this tutorial.
10.2 - Advanced
11 - Content catalog
Access shared content from other teams
Installing an item
To get started, go to the “Content catalog” page, accessible from the homepage or the top menu.
By selecting a point on the map, you will see its description (corresponding to the README.md file of the Git repository).
Click the “View content” button to access the shared content provided by this team.
You can choose the category of content from the tabs at the top-right of the screen, including:
Projects
Plugins
Data cleaning scripts
Datasets
Clicking on a widget will take you to the description of that content.
You can install or update the item simply by clicking the “Install” or “Update” button.
Once the item is installed, you can access it locally from the corresponding page (Projects page, Datasets page, etc.).
To return to the map, click on “Git repositories” at the top of the screen.