TL;DR
Every hospital names and codes its data differently. To compare data across institutions or participate in multi-center studies, everyone needs to speak the same language. That’s the role of standardized medical terminologies: shared vocabularies, each specialized for a specific type of data — diagnoses, laboratory tests, medications, clinical procedures.
The problem: every hospital has its own language
In the previous article, we saw that warehouse data is organized in linked tables. In the laboratory table, for example, a parameter column contains the test name: “Creatinine,” “White blood cells,” “Glucose”…
These names are not universal. From one hospital to another, the same creatinine test can be called:
| Hospital | Label in the system | Unit |
|---|---|---|
| Hospital A | Creatinine | µmol/L |
| Hospital B | S-Creat | mg/dL |
| Clinic C | Creat. serum | µmol/L |
Three different names, two different units — for the same test.
As long as you work within a single hospital, this is not a problem: everyone knows that “S-Creat” means creatinine. As soon as you want to compare data across hospitals for a multi-center study, you need to agree on a common language.
That is exactly the role of medical terminologies.
Classification, terminology, ontology?
These three terms describe increasing levels of complexity [1,2]:
- Classification: a system that organizes concepts into categories to describe a domain in a structured way. Some (such as ICD-10) require mutually exclusive categories to avoid double-counting.
- Terminology: a structured list of concepts with definitions and relationships between them — a concept can have multiple parents (example: LOINC).
- Ontology: goes further by adding formal semantic relationships (description logic) that enable automated machine reasoning (example: SNOMED CT).
A few terminologies to know
ICD-10 — The classification of diagnoses
If you are a clinician working in a hospital, you probably already know ICD-10 — you use it every time you code a hospital stay.
ID card
Full name: International Classification of Diseases, 10th Revision
Type: Classification
Created by: World Health Organization (WHO)
First version: Endorsed by WHO in 1990, published 1992, effective 1993 [3]
Domain: Diagnoses
Number of codes: ~14,400 (WHO base version), up to ~70,000 with national extensions (e.g., ICD-10-CM in the US) [4]
Example
E11.9
Type 2 diabetes mellitus, without complications
| E | Endocrine diseases |
| 11 | Type 2 diabetes |
| .9 | Without complications |
Structure: each code starts with a letter (the chapter, by major disease category), followed by two digits (the category), then optionally a decimal point and additional digits (for precision). For example, chapter I covers diseases of the circulatory system, and code I21.0 designates an acute myocardial infarction of the anterior wall.
Strengths: it is the most widely used classification in the world. Virtually every country uses it. Hospital administrative data (DRG-based systems worldwide) rely on ICD-10 to describe hospital stays and calculate reimbursements.
Limitations: ICD-10 was designed for mortality statistics and billing, not for clinical research. Its granularity is low: it captures neither severity, nor timing, nor the degree of certainty of a diagnosis. The codes assigned reflect what was documented for reimbursement, not necessarily the full clinical picture.
ICD-10 and research: beware of bias
ICD-10 codes are often incomplete or imprecise. A study showed that only 56% of entered diagnostic codes were appropriate, and that 26% of relevant diagnoses were simply not coded [5]. For research purposes, ICD-10 data is a starting point, but it is not always sufficient.
What about ICD-11?
The 11th revision of the ICD was published by WHO in 2019. It significantly improves the ontological structure and granularity. However, international adoption remains very limited — most healthcare systems still use ICD-10.
LOINC — The language of laboratory medicine
ID card
Full name: Logical Observation Identifiers Names and Codes
Type: Terminology
Created by: Regenstrief Institute (Indianapolis, USA)
First version: 1994 [6]
Domain: Laboratory tests, clinical observations, scores, questionnaires
Number of codes: > 100,000 [6]
Example
2160-0
Serum / plasma creatinine
LOINC addresses a simple problem: when a laboratory sends a test result, what exactly was measured? Each LOINC code uniquely identifies a test through six dimensions:
| Dimension | Question | Example |
|---|---|---|
| Component | What is being measured? | Creatinine |
| Property | What type of measurement? | Mass concentration |
| Timing | At what point in time? | Point in time |
| System | From which specimen? | Serum / Plasma |
| Scale | Quantitative or qualitative? | Quantitative |
| Method | Which analytical technique? | (not specified) |
The six dimensions of a LOINC code, illustrated with serum creatinine.
LOINC covers laboratory medicine (chemistry, hematology, microbiology, serology, toxicology), but also clinical observations (scores like SOFA or Glasgow, imaging results, quality-of-life questionnaires) and vital signs.
The problem LOINC solves
In one hospital, the creatinine test is called “CREA.” In another, “Serum Creatinine.” In a third, “S-Creat.” All three tests correspond to the same LOINC code: 2160-0. By aligning (or mapping) local codes to LOINC, you can compare laboratory results across institutions, even if each uses a different software system.
SNOMED CT — The most comprehensive clinical vocabulary
ID card
Full name: Systematized Nomenclature of Medicine — Clinical Terms
Type: Terminology and ontology [7]
Created by: SNOMED International (formerly IHTSDO)
First version: 2002 (merger of SNOMED RT and Clinical Terms Version 3) [8]
Domain: Clinical conditions, procedures, anatomy, organisms, substances…
Number of concepts: ~370,000 active concepts [9]
Example
44054006
Type 2 diabetes mellitus
Where ICD-10 classifies diseases into broad categories, SNOMED CT is a true clinical vocabulary: it describes the medical world in all its richness. It covers not only clinical conditions (diseases, symptoms), but also procedures, anatomy, organisms (bacteria, viruses), substances, medical devices…
The distinctive feature of SNOMED CT is its polyhierarchy: a concept can belong to multiple categories simultaneously. For example, myocardial infarction is both a “cardiac disorder” and an “ischemic disorder.” This structure enables powerful queries: you can search for all cardiac disorders at once, and myocardial infarction will be included automatically.
SNOMED CT is also an ontology: its concepts are defined by formal semantic relationships (description logic), which enables a machine to automatically reason over the data [7].
SNOMED CT vs ICD-10: complementary, not competing
ICD-10 is a classification designed for counting and billing. SNOMED CT is a terminology / ontology designed for detailed clinical description. A hospital can use ICD-10 for administrative coding and SNOMED CT for research — they are not alternatives but complementary tools, each with its own purpose [2].
ATC — The classification of medications
ID card
Full name: Anatomical Therapeutic Chemical Classification System
Type: Classification
Created by: WHO Collaborating Centre for Drug Statistics Methodology (Oslo, Norway)
First version: 1976 [10]
Domain: Medications (active substances)
Number of codes: > 6,300 substances at the finest level [11]
Example
C10AA01
Simvastatin
The ATC system classifies medications according to a five-level hierarchy, from the most general to the most specific:
| Level | Code | Meaning |
|---|---|---|
| 1 — Anatomical system | C | Cardiovascular system |
| 2 — Therapeutic subgroup | C10 | Lipid modifying agents |
| 3 — Pharmacological subgroup | C10A | Lipid modifying agents, plain |
| 4 — Chemical subgroup | C10AA | HMG-CoA reductase inhibitors (statins) |
| 5 — Active substance | C10AA01 | Simvastatin |
The ATC hierarchy, from target organ to molecule.
This hierarchy is very practical for research: you can search for all patients on “statins” (level 4: C10AA) without having to list each molecule individually (simvastatin, atorvastatin, rosuvastatin…).
ATC in practice
A researcher wants to study antibiotic use in an ICU. Rather than searching for each molecule one by one, they use ATC code J01 (antibacterials for systemic use) — and automatically capture all antibiotics, regardless of which specific molecule was prescribed.
RxNorm — The drug terminology
ID card
Full name: RxNorm
Type: Terminology
Created by: National Library of Medicine (NLM, USA)
Domain: Medications (active ingredients, strengths, dose forms, brand names)
Example
36567
Simvastatin 20 mg oral tablet
Where ATC classifies medications by therapeutic group (“statins”), RxNorm identifies the specific product: which active ingredient, at what dose, in what form. It is the reference terminology for describing prescriptions and dispensations.
RxNorm is produced by the NLM and covers drugs marketed in the United States. For medications from other countries, the OHDSI network created RxNorm Extension: an extension that uses the same structure to represent international products (brand names, strengths, and dose forms specific to each country).
ATC vs RxNorm: two levels of description
ATC and RxNorm are complementary. ATC answers the question “which therapeutic class does this medication belong to?” (e.g., C10AA = statins). RxNorm answers “which specific medication was prescribed?” (e.g., simvastatin 20 mg tablet). In a data warehouse, both are useful: ATC for class-level analyses, RxNorm for prescription details.
What about IDMP?
IDMP (Identification of Medicinal Products) is a set of five ISO standards (ISO 11615, 11616, 11238, 11239, 11240) aiming to create a universal identification system for medications — substances, dose forms, strengths, brand names [13]. The UMC (Uppsala Monitoring Centre), a WHO collaborating centre, is responsible for maintaining substance identifiers (GSID) and pharmaceutical product identifiers (PhPID) [14]. The EMA is progressively mandating IDMP in Europe (2025-2026 deadlines), and the FDA is working towards adoption in the United States [13]. However, implementation is still ongoing and these standards are not yet used in research data warehouses — for now, RxNorm remains the de facto standard in that context.
Overview: which terminology for which type of data?
| Data type | Reference terminology | Example code |
|---|---|---|
| Diagnoses | ICD-10 (classification) / SNOMED CT (ontology) | E11.9 / 44054006 |
| Laboratory tests | LOINC (terminology) | 2160-0 |
| Medications | ATC (classification) / RxNorm + RxNorm Extension (terminology) | C10AA01 / 36567 |
| Vital signs | LOINC / SNOMED CT | 8867-4 (heart rate) |
| Procedures | SNOMED CT (ontology) | 80146002 (appendectomy) |
Each terminology has its area of expertise. In practice, a data warehouse contains codes from multiple terminologies simultaneously — ICD-10 for diagnoses, local laboratory codes (to be aligned to LOINC), national medication codes (to be aligned to ATC or RxNorm), etc.
The challenge of mapping
Knowing that LOINC or SNOMED CT exist is not enough. The real challenge is matching each hospital’s local codes with standardized codes. This process is called mapping.
Mapping: from local code to standard code
Hospital A
CREA
Local code
LOINC
2160-0
Standard code
Hospital B
Serum Creatinine
Local code
Two different local codes → one single standard code. That’s mapping.
This mapping work is substantial. A hospital may have hundreds of local laboratory codes to map to LOINC, thousands of diagnostic codes to map to ICD-10 or SNOMED CT, and just as many medication codes to map to ATC or RxNorm.
Expert work, not a click
Mapping cannot be fully automated. It requires both technical expertise (understanding the terminologies) and clinical expertise (understanding what the local code actually represents). It is similar to the data quality work described in article 4: laborious, but durable.
ATHENA: the reference dictionary
To facilitate this mapping work, the OHDSI network (Observational Health Data Sciences and Informatics) maintains a tool called ATHENA — a browser that brings together more than 10 million concepts from 136 different vocabularies, with over 28 million relationships between them [12].
ATHENA allows you to search for a concept (for example, “creatinine”), see all corresponding codes across all terminologies, and explore hierarchical relationships (what are the parent concepts? the child concepts?). It is the reference tool for anyone working on health data standardization.
In practice: data dictionaries
When a multi-center research project starts, the first step is to define a data dictionary: the list of variables to collect, with the standard code to use for each one.
A good example is the INDICATE project (A Federated Infrastructure for ICU Data Across Europe), launched in 2024. This European project aims to connect ICU data from multiple countries through a federated infrastructure — data stays within each hospital, but analyses can be executed in a coordinated manner.
To achieve this, INDICATE defined a data dictionary of 332 concept sets, organized into nine categories:
Demographics & encounters
14 concept sets
Clinical conditions
17 concept sets
Clinical observations
21 concept sets
Vital signs
10 concept sets
Laboratory tests
76 concept sets
Microbiology
48 concept sets
Ventilation
26 concept sets
Medications
112 concept sets
Procedures
8 concept sets
The 332 concept sets from the INDICATE data dictionary, organized by category.
Each concept set is defined using SNOMED CT codes (for conditions and observations), LOINC (for laboratory tests), and RxNorm (for medications). Every hospital participating in the project knows exactly what data to provide and in what format — that is the power of a standardized data dictionary.
Linkr and terminologies
Linkr integrates standardized terminologies to facilitate multi-center research. Warehouse data can be aligned to standard codes (LOINC, SNOMED CT, ATC…), enabling analysis scripts to be shared across hospitals and participation in international research networks.
Key takeaways
- Medical terminologies are standardized vocabularies that allow naming health data universally: ICD-10 for diagnoses, LOINC for laboratory tests, ATC for medications, SNOMED CT for the full range of clinical concepts.
- Each terminology has its own area of expertise — they are complementary, not competing.
- Mapping (matching local codes to standard codes) is substantial expert work, but it is a durable investment that makes data usable at scale.
- Multi-center projects rely on standardized data dictionaries to precisely define which variables to collect and with which codes.
References
[1] Rodrigues JM et al. Classification, Ontology, and Precision Medicine. JMIR Med Inform. 2019. PMC6503847
[2] SNOMED International. What is the difference between a classification and a terminology? SNOMED International FAQ
[3] Steindel SJ. ICD-10: History and Context. J AHIMA. 2012. PMC7960170
[4] Wikipedia. ICD-10. en.wikipedia.org/wiki/ICD-10
[5] Horsky J et al. Accuracy and Completeness of Clinical Coding Using ICD-10 for Ambulatory Visits. AMIA Annu Symp Proc. 2017. PMC5977598
[6] Regenstrief Institute. LOINC 30th Anniversary. regenstrief.org
[7] Bodenreider O et al. SNOMED CT: A Clinical Terminology but Also a Formal Ontology. JBCS. 2023. scirp.org
[8] NLM. Overview of SNOMED CT. nlm.nih.gov
[9] IMO Health. SNOMED CT 101: A 2025 Guide. imohealth.com
[10] WHO. History of ATC/DDD. who.int
[11] Wikipedia. ATC Classification System. en.wikipedia.org
[12] Reich C et al. OHDSI Standardized Vocabularies — a large-scale centralized reference ontology for international data harmonization. JAMIA. 2024;31(3):583-590. PMC10873827
[13] FDA. Identification of Medicinal Products (IDMP). fda.gov
[14] UMC. IDMP — Global product and substance identifiers. who-umc.org