Variables, concept sets, and anchors

Summary

After defining the study population, you need to specify what data to collect and when. This article covers three sections of the Study Designer: concept sets (grouping medical codes), temporal anchors (defining reference dates), and variables (configuring each data point with its time window and aggregation function).

Concept sets

The Concept sets section lets you create reusable sets of medical codes. A concept set groups codes from standardized terminologies (ICD-10, SNOMED CT, LOINC, ATC…) that describe the same clinical concept — for example, all codes for a diagnosis of “sepsis,” or all antibiotic prescription codes. These terminologies were introduced in the article on terminologies.

These sets are then used in the selection criteria (previous article) and in the definition of variables (below).

Interoperability and multicenter extension

Defining your variables and criteria using standardized terminologies from the study design stage ensures your protocol’s interoperability. If you later consider a multicenter extension, this groundwork will save considerable time: each center can use the same concept sets without having to recreate the definitions.

Creating a concept set

Three methods are available to create a concept set:

Browse catalog — opens a catalog of concept set dictionaries. A dictionary groups predefined, validated, and ready-to-use concept sets. You select the relevant sets for your study and import them in one click.
Import from URL — imports a concept set from an external link, useful for accessing dictionaries not yet listed in the built-in catalog
Create manually — creates an empty set to which you add concepts one by one

The INDICATE Data Dictionary

The catalog currently includes the INDICATE Data Dictionary, a dictionary of standardized concept sets developed as part of the European INDICATE project. This dictionary provides a level of abstraction above raw terminologies: rather than handling hundreds of individual LOINC or SNOMED CT codes on ATHENA or ATLAS, you work directly with clinical variables such as “Heart rate”, “Creatinine”, or “Type 2 diabetes”, each associated with expert-curated concept sets. This level of abstraction — the clinical variable, not the individual code — is what research protocols use to define which data to collect. Additional dictionaries could be integrated over time, created by learned societies or specialized working groups (oncology, genetics, cardiology…).

Managing concepts

Once a concept set is created, click on it to open it. You can then:

View the list of included concepts, with for each: the name, vocabulary (ICD-10, LOINC…), code, and identifier
Include or exclude each concept from the set (an excluded concept will not be used in queries)
Add new concepts (by searching on ATHENA or ATLAS)
Remove existing concepts

Each concept set also lets you define:

Unit — selected from a dropdown list of UCUM (Unified Code for Units of Measure) codes, the international standard for units of measurement. For example: mg/dL, mmol/L, bpm.
Retained min and max values — define a plausible value range to automatically exclude outliers during data extraction.

What are concept sets for?

The same concept set can be reused in multiple places: in a selection criterion (e.g., “patients with a sepsis diagnosis”) and in a variable (e.g., “highest creatinine value”). Centralizing codes in a concept set avoids duplication and makes maintenance easier.

Temporal anchors

Temporal anchors are the reference dates around which variables are collected. For example, if you want to measure creatinine “within 24 hours of admission,” the admission date is the temporal anchor.

Anchors are defined in the Anchors tab of the Variables section.

Anchor types

Several anchor types are available:

Hospital admission — the hospital admission date. You specify whether it is the first, last, or each admission. You can also filter by hospital name.
Hospital discharge — the hospital discharge date (first, last, or each discharge). You can filter by hospital name.
Care unit admission — the date of entry into a care unit. You specify the occurrence (first, last, or each) and can filter by unit name.
Care unit discharge — the date of discharge from a care unit (first, last, or each). You can filter by unit name.
Concept set event — a date linked to a concept set, for example the date of the first sepsis diagnosis. You select the relevant concept set and the occurrence (first, last, or each).
Free text — for describing an anchor that doesn’t match any of the above types

Each anchor has a name (e.g., “ICU admission”, “Sepsis diagnosis”) and optional details.

Why define anchors?

In a health data study, when a measurement is collected is just as important as the measurement itself. Temporal anchors formalize this information and report it consistently throughout the protocol, and are also used to automatically generate the study scripts.

Variables

The Variables tab in the same section lets you define each data point to extract. A variable corresponds to a measurement, result, or characteristic that you want to obtain for each individual in your cohort. The concepts of concept, temporal anchor, collection window, and aggregation function were introduced in detail in the article on defining variables.

Creating a variable

Click Add variable to open the creation form. You fill in:

Name — the display name of the variable (e.g., “Creatinine at admission”)
Identifier — the technical name used in exports and generated scripts (e.g., creatinine_admission)
Description — a free-form description of the variable
Data type — continuous, categorical, binary, ordinal, date, or text

Variable source

Two types of variables are available:

Concept set — the variable is extracted from a previously defined concept set. You select the set from a searchable dropdown.
Computed variable — the variable is derived from structural visit data or patient demographics: age, sex, hospitalization duration, or unit stay duration.

Temporal anchor and collection window

Each variable is linked to a temporal anchor. This is the reference date from which the collection window is calculated.

The collection window specifies the interval around the anchor during which data is searched:

Start — offset relative to the anchor (a negative number means “before the anchor”)
End — offset relative to the anchor (can also be negative for a window entirely before the anchor)
Time unit — hours, days, weeks, months, or years

For example, for “the highest creatinine within 24 hours of admission”:

Anchor: admission
Start: 0
End: 24
Unit: hours

Aggregation function

When multiple values exist within the collection window, you choose how to summarize them:

First / Last — the first or last value chronologically
Maximum / Minimum — the highest or lowest value
Mean / Median — the average or median of the values
Presence — indicates whether at least one value exists (yes/no)
Duration — the duration between the first and last value
Count — the number of values found
Sum — the sum of the values

Concrete example

For a study on sepsis, you might define the following variables:

Maximum lactate at H24 — concept set “Lactate”, anchor “ICU admission”, window 0–24 hours, aggregation “Maximum”
Creatinine at admission — concept set “Creatinine”, anchor “Admission”, window −6 to +6 hours, aggregation “First”
History of type 2 diabetes — concept set “Type 2 diabetes”, anchor “Admission”, window null–0 (from all time up to admission), aggregation “Presence”
Age — computed variable “Age”

Table and timeline

Defined variables are displayed in a summary table that shows for each variable: the name, unit, temporal anchor, collection window, and aggregation function. You can edit or delete each variable from this table.

A timeline view is also available. It visually represents the temporal anchors and collection windows of each variable as horizontal bars. This is a good way to check at a glance that all windows are consistent.

Key takeaways

Concept sets group medical codes (ICD-10, LOINC, ATC…) into reusable sets for criteria and variables, with a unit (UCUM) and retained min/max values to exclude outliers.
Temporal anchors define reference dates (admission, discharge, clinical event…) around which variables are collected.
Each variable is linked to an anchor, a collection window, and an aggregation function that specify when and how to extract the data.
The timeline view provides a visual overview of all variable collection windows at a glance.

Next article : Finalize and export the protocol