Speaking the same language: medical terminologies

TL;DR

Every hospital names and codes its data differently. To compare data across institutions or participate in multi-center studies, everyone needs to speak the same language. That’s the role of standardized medical terminologies: shared vocabularies, each specialized for a specific type of data — diagnoses, laboratory tests, medications, clinical procedures.

The problem: every hospital has its own language

In the previous article, we saw that warehouse data is organized in linked tables. In the laboratory table, for example, a parameter column contains the test name: “Creatinine,” “White blood cells,” “Glucose”…

These names are not universal. From one hospital to another, the same creatinine test can be called:

Hospital	Label in the system	Unit
Hospital A	Creatinine	µmol/L
Hospital B	S-Creat	mg/dL
Clinic C	Creat. serum	µmol/L

Three different names, two different units — for the same test.

As long as you work within a single hospital, this is not a problem: everyone knows that “S-Creat” means creatinine. As soon as you want to compare data across hospitals for a multi-center study, you need to agree on a common language.

That is exactly the role of medical terminologies.

Classification, terminology, ontology?

These three terms describe increasing levels of complexity ^[1,2]:

Classification: a system that organizes concepts into categories to describe a domain in a structured way. Some (such as ICD-10) require mutually exclusive categories to avoid double-counting.
Terminology: a structured list of concepts with definitions and relationships between them — a concept can have multiple parents (example: LOINC).
Ontology: goes further by adding formal semantic relationships (description logic) that enable automated machine reasoning (example: SNOMED CT).

A few terminologies to know

ICD-10 — The classification of diagnoses

If you are a clinician working in a hospital, you probably already know ICD-10 — you use it every time you code a hospital stay.

ID card

Full name: International Classification of Diseases, 10th Revision

Type: Classification

Created by: World Health Organization (WHO)

First version: Endorsed by WHO in 1990, published 1992, effective 1993 ^[3]

Domain: Diagnoses

Number of codes: ~14,400 (WHO base version), up to ~70,000 with national extensions (e.g., ICD-10-CM in the US) ^[4]

Example

E11.9

Type 2 diabetes mellitus, without complications

E	Endocrine diseases
11	Type 2 diabetes
.9	Without complications

Structure: each code starts with a letter (the chapter, by major disease category), followed by two digits (the category), then optionally a decimal point and additional digits (for precision). For example, chapter I covers diseases of the circulatory system, and code I21.0 designates an acute myocardial infarction of the anterior wall.

Strengths: it is the most widely used classification in the world. Virtually every country uses it. Hospital administrative data (DRG-based systems worldwide) rely on ICD-10 to describe hospital stays and calculate reimbursements.

Limitations: ICD-10 was designed for mortality statistics and billing, not for clinical research. Its granularity is low: it captures neither severity, nor timing, nor the degree of certainty of a diagnosis. The codes assigned reflect what was documented for reimbursement, not necessarily the full clinical picture.

ICD-10 and research: beware of bias

ICD-10 codes are often incomplete or imprecise. A study showed that only 56% of entered diagnostic codes were appropriate, and that 26% of relevant diagnoses were simply not coded ^[5]. For research purposes, ICD-10 data is a starting point, but it is not always sufficient.

What about ICD-11?

The 11th revision of the ICD was published by WHO in 2019. It significantly improves the ontological structure and granularity. However, international adoption remains very limited — most healthcare systems still use ICD-10.

LOINC — The language of laboratory medicine

ID card

Full name: Logical Observation Identifiers Names and Codes

Type: Terminology

Created by: Regenstrief Institute (Indianapolis, USA)

First version: 1994 ^[6]

Domain: Laboratory tests, clinical observations, scores, questionnaires

Number of codes: > 100,000 ^[6]

Example

2160-0

Serum / plasma creatinine

LOINC addresses a simple problem: when a laboratory sends a test result, what exactly was measured? Each LOINC code uniquely identifies a test through six dimensions:

Dimension	Question	Example
Component	What is being measured?	Creatinine
Property	What type of measurement?	Mass concentration
Timing	At what point in time?	Point in time
System	From which specimen?	Serum / Plasma
Scale	Quantitative or qualitative?	Quantitative
Method	Which analytical technique?	(not specified)

The six dimensions of a LOINC code, illustrated with serum creatinine.

LOINC covers laboratory medicine (chemistry, hematology, microbiology, serology, toxicology), but also clinical observations (scores like SOFA or Glasgow, imaging results, quality-of-life questionnaires) and vital signs.

The problem LOINC solves

In one hospital, the creatinine test is called “CREA.” In another, “Serum Creatinine.” In a third, “S-Creat.” All three tests correspond to the same LOINC code: 2160-0. By aligning (or mapping) local codes to LOINC, you can compare laboratory results across institutions, even if each uses a different software system.

SNOMED CT — The most comprehensive clinical vocabulary

ID card

Full name: Systematized Nomenclature of Medicine — Clinical Terms

Type: Terminology and ontology ^[7]

Created by: SNOMED International (formerly IHTSDO)

First version: 2002 (merger of SNOMED RT and Clinical Terms Version 3) ^[8]

Domain: Clinical conditions, procedures, anatomy, organisms, substances…

Number of concepts: ~370,000 active concepts ^[9]

Example

44054006

Type 2 diabetes mellitus

Where ICD-10 classifies diseases into broad categories, SNOMED CT is a true clinical vocabulary: it describes the medical world in all its richness. It covers not only clinical conditions (diseases, symptoms), but also procedures, anatomy, organisms (bacteria, viruses), substances, medical devices…

The distinctive feature of SNOMED CT is its polyhierarchy: a concept can belong to multiple categories simultaneously. For example, myocardial infarction is both a “cardiac disorder” and an “ischemic disorder.” This structure enables powerful queries: you can search for all cardiac disorders at once, and myocardial infarction will be included automatically.

SNOMED CT is also an ontology: its concepts are defined by formal semantic relationships (description logic), which enables a machine to automatically reason over the data ^[7].

SNOMED CT vs ICD-10: complementary, not competing

ICD-10 is a classification designed for counting and billing. SNOMED CT is a terminology / ontology designed for detailed clinical description. A hospital can use ICD-10 for administrative coding and SNOMED CT for research — they are not alternatives but complementary tools, each with its own purpose ^[2].

ATC — The classification of medications

ID card

Full name: Anatomical Therapeutic Chemical Classification System

Type: Classification

Created by: WHO Collaborating Centre for Drug Statistics Methodology (Oslo, Norway)

First version: 1976 ^[10]

Domain: Medications (active substances)

Number of codes: > 6,300 substances at the finest level ^[11]

Example

C10AA01

Simvastatin

The ATC system classifies medications according to a five-level hierarchy, from the most general to the most specific:

Level	Code	Meaning
1 — Anatomical system	C	Cardiovascular system
2 — Therapeutic subgroup	C10	Lipid modifying agents
3 — Pharmacological subgroup	C10A	Lipid modifying agents, plain
4 — Chemical subgroup	C10AA	HMG-CoA reductase inhibitors (statins)
5 — Active substance	C10AA01	Simvastatin

The ATC hierarchy, from target organ to molecule.

This hierarchy is very practical for research: you can search for all patients on “statins” (level 4: C10AA) without having to list each molecule individually (simvastatin, atorvastatin, rosuvastatin…).

ATC in practice

A researcher wants to study antibiotic use in an ICU. Rather than searching for each molecule one by one, they use ATC code J01 (antibacterials for systemic use) — and automatically capture all antibiotics, regardless of which specific molecule was prescribed.

RxNorm — The drug terminology

ID card

Full name: RxNorm

Type: Terminology

Created by: National Library of Medicine (NLM, USA)

Domain: Medications (active ingredients, strengths, dose forms, brand names)

Example

36567

Simvastatin 20 mg oral tablet

Where ATC classifies medications by therapeutic group (“statins”), RxNorm identifies the specific product: which active ingredient, at what dose, in what form. It is the reference terminology for describing prescriptions and dispensations.

RxNorm is produced by the NLM and covers drugs marketed in the United States. For medications from other countries, the OHDSI network created RxNorm Extension: an extension that uses the same structure to represent international products (brand names, strengths, and dose forms specific to each country).

ATC vs RxNorm: two levels of description

ATC and RxNorm are complementary. ATC answers the question “which therapeutic class does this medication belong to?” (e.g., C10AA = statins). RxNorm answers “which specific medication was prescribed?” (e.g., simvastatin 20 mg tablet). In a data warehouse, both are useful: ATC for class-level analyses, RxNorm for prescription details.

What about IDMP?

IDMP (Identification of Medicinal Products) is a set of five ISO standards (ISO 11615, 11616, 11238, 11239, 11240) aiming to create a universal identification system for medications — substances, dose forms, strengths, brand names ^[13]. The UMC (Uppsala Monitoring Centre), a WHO collaborating centre, is responsible for maintaining substance identifiers (GSID) and pharmaceutical product identifiers (PhPID) ^[14]. The EMA is progressively mandating IDMP in Europe (2025-2026 deadlines), and the FDA is working towards adoption in the United States ^[13]. However, implementation is still ongoing and these standards are not yet used in research data warehouses — for now, RxNorm remains the de facto standard in that context.

Overview: which terminology for which type of data?

Data type	Reference terminology	Example code
Diagnoses	ICD-10 (classification) / SNOMED CT (ontology)	E11.9 / 44054006
Laboratory tests	LOINC (terminology)	2160-0
Medications	ATC (classification) / RxNorm + RxNorm Extension (terminology)	C10AA01 / 36567
Vital signs	LOINC / SNOMED CT	8867-4 (heart rate)
Procedures	SNOMED CT (ontology)	80146002 (appendectomy)

Each terminology has its area of expertise. In practice, a data warehouse contains codes from multiple terminologies simultaneously — ICD-10 for diagnoses, local laboratory codes (to be aligned to LOINC), national medication codes (to be aligned to ATC or RxNorm), etc.

The challenge of mapping

Knowing that LOINC or SNOMED CT exist is not enough. The real challenge is matching each hospital’s local codes with standardized codes. This process is called mapping.

Mapping: from local code to standard code

Hospital A

CREA

Local code

→

LOINC

2160-0

Standard code

←

Hospital B

Serum Creatinine

Local code

Two different local codes → one single standard code. That’s mapping.

This mapping work is substantial. A hospital may have hundreds of local laboratory codes to map to LOINC, thousands of diagnostic codes to map to ICD-10 or SNOMED CT, and just as many medication codes to map to ATC or RxNorm.

Expert work, not a click

Mapping cannot be fully automated. It requires both technical expertise (understanding the terminologies) and clinical expertise (understanding what the local code actually represents). It is similar to the data quality work described in article 4: laborious, but durable.

ATHENA: the reference dictionary

To facilitate this mapping work, the OHDSI network (Observational Health Data Sciences and Informatics) maintains a tool called ATHENA — a browser that brings together more than 10 million concepts from 136 different vocabularies, with over 28 million relationships between them ^[12].

ATHENA allows you to search for a concept (for example, “creatinine”), see all corresponding codes across all terminologies, and explore hierarchical relationships (what are the parent concepts? the child concepts?). It is the reference tool for anyone working on health data standardization.

In practice: data dictionaries

When a multi-center research project starts, the first step is to define a data dictionary: the list of variables to collect, with the standard code to use for each one.

A good example is the INDICATE project (A Federated Infrastructure for ICU Data Across Europe), launched in 2024. This European project aims to connect ICU data from multiple countries through a federated infrastructure — data stays within each hospital, but analyses can be executed in a coordinated manner.

To achieve this, INDICATE defined a data dictionary of 332 concept sets, organized into nine categories:

Demographics & encounters

14 concept sets

Clinical conditions

17 concept sets

Clinical observations

21 concept sets

Vital signs

10 concept sets

Laboratory tests

76 concept sets

Microbiology

48 concept sets

Ventilation

26 concept sets

Medications

112 concept sets

Procedures

8 concept sets

The 332 concept sets from the INDICATE data dictionary, organized by category.

Each concept set is defined using SNOMED CT codes (for conditions and observations), LOINC (for laboratory tests), and RxNorm (for medications). Every hospital participating in the project knows exactly what data to provide and in what format — that is the power of a standardized data dictionary.

Linkr and terminologies

Linkr integrates standardized terminologies to facilitate multi-center research. Warehouse data can be aligned to standard codes (LOINC, SNOMED CT, ATC…), enabling analysis scripts to be shared across hospitals and participation in international research networks.

Key takeaways

Medical terminologies are standardized vocabularies that allow naming health data universally: ICD-10 for diagnoses, LOINC for laboratory tests, ATC for medications, SNOMED CT for the full range of clinical concepts.
Each terminology has its own area of expertise — they are complementary, not competing.
Mapping (matching local codes to standard codes) is substantial expert work, but it is a durable investment that makes data usable at scale.
Multi-center projects rely on standardized data dictionaries to precisely define which variables to collect and with which codes.

References

[1] Rodrigues JM et al. Classification, Ontology, and Precision Medicine. JMIR Med Inform. 2019. PMC6503847

[2] SNOMED International. What is the difference between a classification and a terminology? SNOMED International FAQ

[3] Steindel SJ. ICD-10: History and Context. J AHIMA. 2012. PMC7960170

[4] Wikipedia. ICD-10. en.wikipedia.org/wiki/ICD-10

[5] Horsky J et al. Accuracy and Completeness of Clinical Coding Using ICD-10 for Ambulatory Visits. AMIA Annu Symp Proc. 2017. PMC5977598

[6] Regenstrief Institute. LOINC 30th Anniversary. regenstrief.org

[7] Bodenreider O et al. SNOMED CT: A Clinical Terminology but Also a Formal Ontology. JBCS. 2023. scirp.org

[8] NLM. Overview of SNOMED CT. nlm.nih.gov

[9] IMO Health. SNOMED CT 101: A 2025 Guide. imohealth.com

[10] WHO. History of ATC/DDD. who.int

[11] Wikipedia. ATC Classification System. en.wikipedia.org

[12] Reich C et al. OHDSI Standardized Vocabularies — a large-scale centralized reference ontology for international data harmonization. JAMIA. 2024;31(3):583-590. PMC10873827

[13] FDA. Identification of Medicinal Products (IDMP). fda.gov

[14] UMC. IDMP — Global product and substance identifiers. who-umc.org