Joining with the CONCEPT table
It is currently not possible to perform joins in SQL, as the concepts are stored in a different database from the data.
This will be resolved as soon as possible by adding a CONCEPT table to the loaded database, containing only the concepts used by the loaded dataset (#135).
Console
Introduction
We have seen how to create a project using plugins to create widgets.
These widgets always include a tab for displaying or manipulating data through a graphical interface and a tab to modify the code behind the displayed result.
Modifying the code through the widget editor allows you to go beyond what the graphical interface offers. However, if you want to display the data in a different format (e.g., a data table instead of a ggplot2 figure), you are limited.
For this, there are two solutions:
- Use the Console plugin, a generic plugin that can display data in the desired format, such as a data table, a Plotly figure, or even a web interface generated with Shiny.
- Use the Console page, which is more accessible and allows you to easily test code snippets with loaded data.
Console plugin
If you don’t have this plugin, download it from the InterHop content catalog.
Create a widget with the Console plugin wherever you want, in a tab on the individual data or aggregated data page.
There’s no need to select concepts, as this plugin does not use them.
Once the widget is created, it works like other widgets, as detailed in Creating a Project > Creating Tabs and Widgets.
In the “Figure settings” section, select the language and output.
For example, for the R language, the following outputs are available:
- Console: displays the result as it would appear in the R console
- UI (HTML): displays the result in a Shiny
uiOutput
- Figure: corresponds to a Shiny
figureOutput
- Table: corresponds to a Shiny
tableOutput
- DataTable: corresponds to a
DT::DTOutput
from DT and Shiny - RMarkdown: displays the HTML output from rendering an RMarkdown file in the console
You can use different settings files, corresponding to different scripts.
Thus, the Console plugin can handle anything that plugins cannot (yet!) display.
Often, solving a problem with the Console plugin is the first step toward creating a plugin.
Here is an example of using the “Console” plugin to create a histogram showing the ages of patients in the dataset associated with the project.
Here is the code used if you’d like to test it:
d$visit_occurrence %>%
dplyr::left_join(
d$person %>% dplyr::select(person_id, birth_datetime),
by = "person_id"
) %>%
dplyr::collect() %>%
dplyr::mutate(
age = round(
as.numeric(
difftime(visit_start_datetime, birth_datetime, units = "days")
) / 365.25, 1
)
) %>%
ggplot2::ggplot(ggplot2::aes(x = age)) +
ggplot2::geom_histogram(binwidth = 5, fill = "#0084D8", color = "white") +
ggplot2::labs(
x = "Age (years)",
y = "Frequency"
) +
ggplot2::theme_minimal() +
ggplot2::theme(
plot.title = ggplot2::element_text(size = 16, face = "bold"),
axis.title = ggplot2::element_text(size = 14)
)
Console page
The Console page is accessible from any page of the application by clicking the icon at the top of the screen.
Similar to the Console plugin, you can choose the programming language and output.
In addition to using R and Python, the Shell is also accessible, allowing you to display files and directories using commands like ls
, which is useful when working with a Docker container.
This console also aids in programming LinkR by making various operational variables of the application available. These variables are prefixed with r$
.
For example, r$users
displays the variable containing the users.
Access to this console can be restricted from the user management page.
Access to r$...
variables is not available in the Console plugin.
In both the Console plugin and the Console page, shortcuts are available:
- Ctrl|CMD + Shift + C: comment or uncomment the selected code.
- Ctrl/CMD + Enter: execute the selected code (executes all code if none is selected).
- Ctrl/CMD + Shift + Enter: execute all code.
Data variables
The main advantage of this console, whether via the Console plugin or the Console page, is the ability to manipulate OMOP tables from a dataset.
To load data, you can:
- Either load them from the “Dataset” page by selecting a dataset, navigating to the “Code” tab, and clicking “Run Code.”
- Or load a project associated with a dataset.
Once the data is loaded, they become accessible via variables prefixed by d$
(d stands for data).
All OMOP tables are available via these variables.
As you can see here, all tables are loaded lazily (indicated by question marks instead of the number of rows in the dataframe), meaning they are not loaded into memory.
This conserves resources and allows you to filter data before loading it into memory using dplyr::collect()
.
In the following example, we filter the data from the Measurement table for patient 13589912
before collecting it into memory.
The tables available in d$
are the complete tables, including all data from the loaded dataset.
Subsets of this data exist depending on the selected elements:
d$data_subset
: Contains all tables for the patients in the selected subset.d$data_person
: Contains data for the selected patient.d$data_visit_detail
: Contains data for the selected visit.
Each of these variables will include OMOP tables, such as d$data_person$measurement
, except for tables where this would not make sense (e.g., there is no d$data_person$person
table, as the d$person
table lists all patients).
For example, if in the currently open project, I selected the same patient as before (13589912
), I would retrieve with d$data_person$measurement
the same data as earlier when I filtered the global variable d$measurement
for this patient.
To retrieve the selected elements, I can use variables prefixed by m$
:
m$selected_subset
: Currently selected subset.m$selected_person
: Selected patient.m$selected_visit_detail
: Selected visit.
The concepts from OMOP terminologies are available in the d$concept
variable.
You can use the join_concepts
function to facilitate joins between variables.
d$measurement %>%
dplyr::filter(person_id == 13589912) %>%
join_concepts(d$concept, c(
"measurement", "measurement_type",
"operator", "unit"
)) %>%
dplyr::select(
person_id,
measurement_concept_id, measurement_concept_name,
value_as_number, unit_concept_name
) %>%
dplyr::collect()
SQL queries
For interoperability, it is necessary to query OMOP tables in SQL.
When you import data into LinkR, it always involves a database connection.
Indeed, either you use the “db” value for the data_source
argument, in which case you provide the con
object directly, representing the connection to the OMOP database, or you use the “disk” value for the data_source
argument. In this case, whether your data is in Parquet or CSV format, they are loaded by creating a DuckDB database.
Thus, as soon as data is loaded into LinkR, a d$con
connection object is created, allowing you to query your data in SQL.
The following code displays all data from the person table:
DBI::dbGetQuery(d$con, "SELECT * FROM person") %>% tibble::as_tibble()
This query retrieves the age of patients:
sql <- "
SELECT
v.visit_occurrence_id,
v.person_id,
ROUND(
EXTRACT(EPOCH FROM (
CAST(v.visit_start_datetime AS TIMESTAMP) -
CAST(p.birth_datetime AS TIMESTAMP)
)) / (365.25 * 86400),
1
) AS age
FROM
visit_occurrence v
LEFT JOIN
(SELECT person_id, birth_datetime FROM person) p
ON
v.person_id = p.person_id;
"
DBI::dbGetQuery(d$con, sql) %>% tibble::as_tibble()