Mapping is not enough: the three levels of concept derivation

Summary

Federated research assumes every site exposes its data in the same format, mapped to the same concepts. But data coming out of electronic health records is often raw: the target concept either does not exist as such, or it exists but is not reliable. You then have to derive it from the source concepts. I propose a three-level framework — no derivation, interoperable derivation, local derivation — and a simple rule for picking the right level, concept by concept.

The starting point: federation assumes aligned data

I work on INDICATE, a European project building a federated infrastructure for intensive care data. In a federated model, the data does not move — the analysis travels instead. Code is sent to each hospital, runs locally, and only aggregate results come back. Patient data never leaves its home institution.

Centralized model: data from hospitals converges into a single database — **Centralized** — data converges into a single warehouse.

Federated model: the analysis travels from the center to each hospital, data stays in place — **Federated** — the analysis travels to the sites, data stays in place.

Illustrations from the INDICATE course Data model training — Session 1 — Onboarding and data model.

For this to work, there is one non-negotiable prerequisite: every site must expose its data in the same format, mapped to the same concepts. Concretely, each site maps its source concepts — its local codes — onto standard target concepts (SNOMED, LOINC, RxNorm…). When a study is run in federation, the scripts query every site at once: the mapping must therefore be consistent across sites, otherwise the scripts do not work, or return biased results.

That is the whole point of interoperability, and it is what our Data Dictionary is for: harmonizing that semantic mapping across sites, in particular through expert comments from each domain, which state which concept to keep, which to exclude, and why.

This is how I presented that mapping, in our article Building an OMOP ETL: semantic mapping is the bulk of the work, and loading data into the OMOP common model then comes down to mere routing — once you know which target concept each source concept maps to, you just send each record to the right table. That is true for a sodium measurement. It breaks down the moment the target concept does not exist as such in the source data.

The real problem: raw data

The data in an electronic health record is entered by clinicians, through software that differs from one hospital to the next — and so the way it is filled in differs too. It reaches the health data warehouse through an ETL process, often as-is — that is, raw. (Our articles Health data warehouses and From raw data to usable data cover both notions in detail.)

The target concept can then be in one of three situations:

it already exists, standardized and reliable, in the data;
it does not exist as such, but can be rebuilt from other concepts that are already standard;
it does not exist, and even the variables you would use to rebuild it have no standard model — how they are recorded depends on the software and on each team’s habits.

These three situations call for three different responses — three routes to the same target concept. They are not tiers you climb one after another: depending on the concept, you take one route or another, and some routes combine. The diagram below sums it up.

Three derivation routes from raw data to a standard target concept: no derivation; local derivation (direct or through an interoperable derivation); interoperable derivation — The three derivation routes: from the source concept to the standard target concept. Depending on the concept, you take one of them — and the local route can extend into an interoperable derivation.

Level 1 — no derivation

The target concept already exists in the data, standardized and reliable. The only work is mapping: linking the source code (or local variable) to the right standard concept. No logic, no computation.

This is the ideal case, the one that comes to mind when we talk about interoperability. A blood glucose, a measured heart rate, a SNOMED diagnosis code already present: point them at the matching standard concept and you are done.

The residual difficulty at this level is choosing the right codes among hundreds of candidates — a single “sodium” can map to dozens of LOINC codes depending on specimen, method, units. But that is a semantic selection problem, not a reconstruction problem.

Concrete examples — level 1

Heart rate: the value is already in the data under a recognized code → map this source concept onto the standard target concept, done.
Blood glucose: a lab result with its LOINC code → direct mapping.
A SNOMED-coded diagnosis already present (e.g. a pneumonia code) → point it at the target concept.

Level 2 — interoperable derivation

The target concept does not exist as such, but it can be rebuilt from other concepts that are themselves already standard. Since the ingredients here are standard, the reconstruction recipe is generic: write it once and it holds for every site. Hence the name: interoperable derivation.

The textbook example is invasive ventilation. A prescription of invasive ventilation, taken on its own, is unreliable: an order may linger because someone forgot to stop it, or else the ventilator settings may not flow back into the ICU software for a while, and so on. We do not discard it for all that: we combine it with other concepts, also measured in a standard way:

a plateau pressure (SNOMED code 264907004 — OMOP concept 4139635) can only be measured on an intubated patient: its presence strongly suggests invasive ventilation;
a PEEP (SNOMED code 250854009 — OMOP concept 4353713) does not distinguish non-invasive from invasive ventilation, but it rules out plain oxygen therapy;
the tidal volume (SNOMED code 13621006 — OMOP concept 4029625) and the ventilation mode complete the picture.

Taken one by one, each of these variables is fallible and insufficient: none is enough to conclude on its own. It is their combination, run through expert rules, that yields a standardized, reliable “Invasive ventilation” concept. And that rule is transferable: the same reasoning holds in Rennes, in Amsterdam or in Seville.

“Was this patient ventilated?” — depending on the data source

The patient is intubated from D0 to D2. No raw source, taken alone, recovers that window.

Clinical reality (D0 → D2)Raw source

No source is reliable on its own. The order stays open a day too long (never cleared), the plateau pressure stops before the end of invasive ventilation, and PEEP reappears under non-invasive ventilation after weaning. It is their combination, run through expert rules, that recovers the D0 → D2 window.

Another level-2 example: acute kidney injury under the KDIGO criteria. It is defined almost entirely by a rule on serum creatinine (change from a baseline value) and urine output (mL/kg/h). If those two variables are mapped, you get the “acute kidney injury” concept through a purely interoperable script — without dropping to the local derivation level. That matters: not every concept requires local work. That said, urine output shows the boundary is not clear-cut: depending on how it is recorded (hourly totals, spot readings, bags…), mapping it cleanly may, at some sites, require a local script upstream.

Concrete examples — level 2

Invasive ventilation: a plateau pressure + PEEP + tidal volume → rule → writes the “Invasive ventilation” concept.
Acute kidney injury (KDIGO): a rule on creatinine + urine output → writes the “AKI” concept.
SOFA score: computed from already-standard concepts (PaO₂/FiO₂, platelets, bilirubin, mean arterial pressure and vasopressors, Glasgow, creatinine) → an aggregation rule → writes the score.

The ingredients are concepts that are already standard everywhere: the rule is written once and runs at every site.

Level 3 — local derivation

Here, the target concept does not exist, and the variables that would let you rebuild it have no standard model — or their reliability depends entirely on how each team records them. You then need a site-specific script. This level is not standardizable, and that is fine.

Take prone positioning. The target concept itself exists in standard form, reliable — Prone body position (SNOMED code 1240000 — OMOP concept 4050473).

The problem is upstream. How do you know a patient was in the prone position? In my center, that information is reconstructed from three variables:

Patient position, which can take the values prone, supine, left lateral, right lateral, semi-recumbent;
Head position, supine or prone for instance;
the order to place the patient prone (Placing subject in prone position, SNOMED code 431182000 — OMOP concept 4196006) — the prescribed act, to be distinguished from the patient actually being in the prone position.

That order, taken on its own, is unreliable: the act may have been performed without being ordered, or the order may not have been cleared when it stopped, leaving an erroneous end date. And querying the concept database shows that there is no standard observable concept with a standardized value set for the first two. SNOMED does have a “Body position” observable (concept 4287468), but it is not bound to a value list like { prone, supine, left lateral, right lateral, semi-recumbent }; on the LOINC side, the answer list is limited to Sitting / Lying / Standing. For head position, nothing standardized with those values. All the more so as these values vary from one site to the next, and even more from one country to another.

You therefore need a local script that takes these variables and applies site-specific rules: which parameter to favor when they conflict, which to treat as the most reliable. As output, it produces the standard concept Prone body position. And what holds for my center will not hold for another: a neighboring hospital, with different software and different recording habits, will write a different script to reach the same target concept.

This is why, at this level, each team must work in pairs: a data scientist and a healthcare professional — the end user who actually enters the data in the software. They alone know how the information is recorded in practice: neither the data scientist alone nor the clinician alone is enough.

Concrete examples — level 3

Prone position: “patient position” (prone/supine/lateral…) + “head position” + the order to place the patient prone — variables with no standard values, or unreliable → local script → writes Prone body position.
Free-text data: anything recorded as text (symptoms, medical history…) must be processed into structured concepts — a report mentions “fever, headache, retro-orbital pain” → local natural language processing (NLP) → standard concepts.

The script depends on the software and recording habits: another site will write its own to reach the same target concept.

All three levels at once: the dengue example

A case that was reported to me illustrates how these levels can coexist, and even compose, for one and the same target concept. To determine whether a patient has dengue, you rely on diagnostic criteria. The WHO 2009 classification (Figure 1.4, p. 11) organizes them along two axes: a severity axis (dengue, with or without warning signs, vs severe dengue) and a certainty axis (probable dengue, clinical, vs confirmed dengue, laboratory). Probable dengue, for instance, combines clinical and laboratory elements:

The context

The patient lives in or returns from a dengue-endemic area, and has a fever.

At least two clinical signs

Among:

nausea / vomiting
rash
aches and pains
positive tourniquet test
leukopenia
any warning sign (abdominal pain, persistent vomiting, mucosal bleeding, lethargy, liver enlargement > 2 cm…)

Laboratory confirmation

A positive test (PCR, NS1 antigen, IgM/IgG serology) moves the case from probable to confirmed.

At the scale of a health data warehouse, gathering all that information is not straightforward — and depending on what the database already holds, you draw on one, two, or all three levels. That is what makes dengue the complete example, where all three derivation cases show up:

sometimes dengue is already structured as a diagnosis in the warehouse — a source “dengue” code entered by the clinician. You then simply map it onto the standard concept Dengue (SNOMED code 38362002 — OMOP concept 440022): that is level 1, no derivation. When that code exists and you deem it reliable, there is no need to go further;
when it is not coded as such, you reconstruct it. Some clinical signs are recorded in free text: you need NLP to extract them — typically, spotting symptoms in clinical reports — and produce structured concepts. That is local derivation (the text, its language, its abbreviations are specific to each site), level 3;
once those signs and the lab results (PCR, NS1, IgM/IgG serology) are available as mapped concepts, you can write an interoperable script — level 2 — that applies the WHO criteria over those concepts to produce the target “Dengue” concept. The idea, in pseudo-code:

if (lives in / travelled to an endemic area) and fever
   and (≥ 2 of: nausea/vomiting, rash, aches and pains,
                positive tourniquet test, leukopenia, warning sign):
    dengue = "probable"
    if (PCR+ or NS1+ or IgM/IgG serology+):
        dengue = "confirmed"
    if (shock or severe bleeding or severe organ involvement):
        dengue = "severe"
→ write the "Dengue" concept (with its certainty / severity level)

And NLP is not only for extracting symptoms: sometimes it is the dengue diagnosis itself that is spelled out in a clinical report. Spotting it is still level 3 (a local script, tied to the site’s language and habits), but what it produces is directly the target concept — so you go straight from level 3 to level 1, with no interoperable rule in between.

So, depending on the site, you take one path or another: a plain mapping if dengue is already coded (level 1); extracting the diagnosis from text (level 3) that lands directly on the concept (level 1); or a climb from level 3 (local NLP) to level 2 (interoperable rules) before reaching the target concept, as the figure below sums up.

Deriving the “Dengue” concept brings together all three levels: if dengue is already coded, a plain mapping suffices (no derivation); otherwise a local script structures the text — the symptoms then feed the interoperable script that applies the WHO criteria, or the diagnosis, extracted as-is, lands directly on the concept.

Sepsis follows the same pattern: the SOFA score is computed by interoperable rules (lactate, platelets, bilirubin, PaO₂/FiO₂ ratio, vasopressor support, Glasgow), but “suspicion of infection” — pairing a microbiological sample with an antibiotic order within a time window — often requires local scripts.

The rule: think concept by concept

There is no “default” level. The right level is decided for each target concept, from a single question:

Does the target concept exist, standardized and reliable, in the source data?

Yes → level 1, no derivation, plain mapping.
No, but it is reconstructible from already-standard concepts → then it is worth writing an interoperable script that can be shared with the community (level 2).
No, and the source variables have no standard model / their reliability depends on recording habits → level 3, local derivation, data scientist + clinician pair.

And the paths compose: sometimes you go straight from local to target (prone positioning), sometimes from local to interoperable to target (dengue, sepsis), sometimes you stay entirely interoperable (acute kidney injury).

One last reflex worth keeping: as soon as a derivation is interoperable, it is worth sharing with the community. A script that depends only on standard concepts will work elsewhere as-is — publishing it spares every team from rewriting it and speeds up their mapping work. What is local will stay local; but anything that can be pooled should be.

Federation assumes data mapped to the same concepts — but raw data does not always contain the target concept.
Three levels of concept derivation: no derivation (plain mapping of the source concept onto the target concept), interoperable derivation (generic rules over standard concepts), local derivation (site-specific script).
The level is decided concept by concept, from one question: does the target concept exist, standardized and reliable, in the source concepts?
Local derivation requires a data scientist + healthcare professional pair: only the end user knows how the data is recorded.
As soon as a derivation is interoperable, share it: a script that depends only on standard concepts serves everyone and speeds up other sites' mapping.