Summary
The OMOP CDM (Observational Medical Outcomes Partnership Common Data Model) has become the global standard for structuring observational healthcare data. Born from an American pharmacovigilance project in 2008, it is now carried by the OHDSI community — an open network of over 4,700 collaborators across 88 countries, representing nearly one billion patient records.
What you already know
If you’ve followed the articles in the Understanding the basics section, you already have good intuition about the problems OMOP solves:
- Healthcare data is scattered across multiple hospital software systems, each with its own format (article 1).
- To use it for research, data must be gathered into a clinical data warehouse through an ETL process (article 3).
- This data is organized in related tables in long format (article 5).
- And each hospital uses different codes for the same thing — hence the need for standardized terminologies (article 6).
OMOP addresses all of these problems: a common data model that defines both the structure of tables (structural interoperability) and the vocabularies to use (semantic interoperability).
But to understand why OMOP prevailed, we need to go back to its origins.
The origins: the OMOP project (2008–2013)
A pharmacovigilance problem
In 2007, the antidiabetic drug rosiglitazone (Avandia) was the subject of a major safety alert: a meta-analysis published in the New England Journal of Medicine (Nissen & Wolski, 2007) suggested an increased cardiovascular risk. The drug was being prescribed to millions of patients worldwide at the time.
This case highlighted a fundamental question: can healthcare data already collected in routine care (insurance databases, hospital records, registries) be used to detect adverse drug effects before a disaster occurs?
The creation of the OMOP project
In this context, the OMOP (Observational Medical Outcomes Partnership) project was launched in 2008 as a public-private partnership coordinated by the FNIH (Foundation for the National Institutes of Health) and overseen by the FDA (Food and Drug Administration).
The goal was ambitious: scientifically evaluate whether observational data analysis methods could reliably identify adverse drug effects — and if so, which methods worked best.
What is observational data?
Unlike data from a randomized clinical trial (where patients are assigned to a treatment), observational data comes from routine care: electronic health records, claims databases, registries, etc. It reflects real-world medical practice but is more subject to biases.
The Common Data Model
To compare methods across different databases, the OMOP researchers needed a common format. They therefore created the Common Data Model (CDM): a relational schema into which all partner databases had to transform their data.
This model was designed with clear principles:
- Patient-centric: every clinical event is linked to a patient and dated.
- Semantically standardized: local codes from each hospital are converted to standard vocabularies (SNOMED CT for diagnoses, RxNorm for drugs, LOINC for lab tests…).
- Technology-agnostic: the model works on any relational database (PostgreSQL, SQL Server, Oracle…).
- Traceable: original source codes are preserved alongside standard codes, allowing verification.
The results
The OMOP project produced major results between 2010 and 2013:
- OMOP Experiment (Ryan et al., 2012): a systematic evaluation of safety signal detection methods across 10 databases and 399 drug-adverse effect pairs. The study showed that some methods were significantly more performant than others.
- OMOP CDM v4: the first mature version of the data model, used by project partners.
- Standardized vocabularies: a mapping system between local codes and standard terminologies, managed through the ATHENA tool.
When the OMOP project reached the end of its mandate in 2013, the community that had formed around it refused to disband. It was about to give rise to something much bigger.
The birth of OHDSI (2014)
From a project to a community
In 2014, the researchers and institutions involved in OMOP founded OHDSI (Observational Health Data Sciences and Informatics, pronounced “Odyssey”) — an open international community with its coordinating center at Columbia University (New York).
OHDSI's mission
“To improve health by empowering a community to collaboratively generate the evidence that promotes better health decisions and better care.”
Unlike the OMOP project, which was a funded research program with a start and end date, OHDSI is designed as a permanent community, open to all, with no membership fees, and founded on clear principles:
- Openness: all methods, tools, and results are publicly accessible.
- Reproducibility: analyses must be reproducible and well-calibrated.
- Collaboration: priorities are defined collectively.
- Innovation: encouragement of novel methodological approaches.
- Beneficence: protection of participants’ rights.
The distributed research model
One of OHDSI’s most innovative aspects is its federated research model:
- Each institution keeps its data locally — patient data never leaves the hospital.
- A study protocol and analysis code are shared with partners.
- Each partner executes the code on their own data.
- Only aggregated results (no individual data) are shared for synthesis.
This model respects data sovereignty and patient privacy, while enabling studies at a scale otherwise impossible. It is fully compatible with regulations like GDPR in Europe.
OHDSI today: a global community
The numbers
In 2026, OHDSI represents:
- 4,700+ collaborators across 88 countries
- 544 standardized data sources in 54 countries
- 974 million+ unique patient records mapped to the OMOP CDM format
- A network spanning all continents
Regional chapters
The community has organized into regional chapters:
- OHDSI Europe — based at Erasmus MC (Rotterdam), with an annual symposium
- OHDSI Asia-Pacific — 7 chapters: Australia, China, India, Japan, Singapore, South Korea, Taiwan
- Active communities in Latin America and Africa
The community in action
OHDSI is driven by multiple channels of collaboration:
- The OHDSI Forums (forums.ohdsi.org): the main discussion platform
- Weekly community calls: research presentations, tool demos, methodological debates
- Annual symposia: scientific conferences with plenary sessions, posters, hands-on tutorials
- Study-a-thons and hack-a-thons: intensive collaborative work sessions
- 20+ working groups: CDM & Vocabularies, Estimation, Prediction, ATLAS, NLP, Genomics, FHIR, Data Quality…
Studies that changed the game
Treatment pathways (2015)
OHDSI’s first major network study examined treatment pathways for three chronic diseases — diabetes, depression, and hypertension — across 11 data sources and 250 million patients. Published in the Proceedings of the National Academy of Sciences (Hripcsak et al., 2016), the study revealed surprising geographic variations in first-line treatment choices.
LEGEND
The LEGEND program (Large-scale Evidence Generation and Evaluation across a Network of Databases) introduced a new paradigm: instead of comparing two treatments at a time, LEGEND compares all treatments for a disease simultaneously, across all relevant clinical outcomes. For type 2 diabetes alone (LEGEND-T2DM), the study covered 190 million patients. A major result published in The Lancet showed that the world’s most prescribed antihypertensive was not the most effective.
COVID-19 (March 2020)
In March 2020, the OHDSI community organized a COVID-19 study-a-thon: 330+ participants from 30 countries worked for 88 hours to produce study protocols, cohorts, and analyses. Among the results:
- The hydroxychloroquine safety study covered 956,374 users across 14 data sources in 6 countries. Published in The Lancet Rheumatology (Lane et al., 2020), it was cited by the European Medicines Agency (EMA) in a warning about serious side effects.
- The COVER prediction model was the first COVID-19 prediction model developed and validated by OHDSI.
The open source tool ecosystem
OHDSI doesn’t just offer a data model — it’s a complete ecosystem of open source tools:
ATLAS
Web platform for designing cohorts, characterizing populations, estimating effects, and predicting clinical outcomes — without writing code.
ACHILLES
Automated characterization and quality control tool for OMOP CDM databases.
ATHENA
Reference dictionary: 10 million+ medical concepts from 136 vocabularies, with their relationships and hierarchies.
HADES
Collection of R packages for large-scale analysis: characterization, population estimation, patient prediction.
CDM versions
The data model has evolved over time:
| Version | Year | Key evolution |
|---|---|---|
| v4 | 2012 | First mature version, used in the OMOP project |
| v5.0 | 2014 | Major redesign at OHDSI’s creation, added cost and note tables |
| v5.2 | 2017 | Added SURVEY_CONDUCT, cost table improvements |
| v5.3 | 2018 | Added VISIT_DETAIL, stabilization |
| v5.4 | 2021 | Current version — added episode and drug event tables |
v5.4 is the version currently supported by all OHDSI tools. A new v5-series release is planned for 2026.
Major projects around OMOP
OMOP adoption has gone well beyond the academic community. National institutions and major international projects have adopted it.
In Europe
EHDEN (European Health Data & Evidence Network, 2018–2024) was the catalyst for OMOP adoption in Europe. Funded by IMI2 at 31 million euros, this project harmonized 850 million+ records across 210 data sources in 30 countries. EHDEN trained and certified 64 SMEs to support hospitals in transforming their data. The project became a permanent foundation in 2024.
DARWIN EU (Data Analysis and Real World Interrogation Network) is the European Medicines Agency’s real-world data network, operational since 2022. With 30 partners in 16 European countries and 180 million patients, it produces regulatory studies in an average of 4 months — an unprecedented timeline. It is the first Real World Evidence network directly integrated into European pharmaceutical regulation.
The EHDS (European Health Data Space), whose regulation was adopted on February 11, 2025 and entered into force on March 26, 2025, positions OMOP as a key interoperability standard for the secondary use of health data in Europe.
Other European projects have also adopted OMOP:
- PIONEER (IMI2): 3.5 million prostate cancer patients
- HARMONY (IMI/IHI): 120,000+ hematology records
- BigData@Heart (IMI, 2017–2023): 5 million+ cardiovascular patients
- INDICATE (EIT Health, 2024): federated infrastructure for ICU data
In the United States
All of Us (NIH) is one of the world’s largest precision medicine programs, with 700,000+ participants whose EHR data is harmonized to OMOP CDM.
CHoRUS (NIH Bridge2AI, 2022) brings together 14 US hospitals around a multimodal dataset (EHR, waveforms, imaging) of 50,000 ICU admissions, with 1.6 billion rows in OMOP format.
National initiatives
| Country | Initiative | Scale |
|---|---|---|
| France | Health Data Hub — SNDS to OMOP conversion | 3M patient sample |
| South Korea | HIRA K-OMOP — national claims data | 56.4M patients (entire population) |
| United Kingdom | NHS SDEs — OMOP adopted as standard | National Secure Data Environment network |
| Canada | Health Data Research Network Canada | 4 provinces |
| Australia | Patron — primary care database | 2M patients, 140+ practices |
| Singapore | Ministerial collaboration | National research platform |
The FHIR–OMOP convergence
Two standards dominate the healthcare data world today:
- FHIR (Fast Healthcare Interoperability Resources): the standard for real-time data exchange in care settings (prescriptions, lab results, system transfers).
- OMOP CDM: the standard for large-scale analysis of observational data in research.
These two standards are complementary, not competing. A FHIR-to-OMOP Implementation Guide is currently being standardized through HL7, with a ballot in September 2025. The goal: facilitate automatic transformation of data from FHIR to OMOP for research.
Where does Linkr fit in?
Linkr natively integrates the OMOP CDM model. The platform allows clinicians to work with OMOP-formatted data without needing to master SQL or the technical details of the model — while offering data scientists full access to the CDM for advanced analyses.
- OMOP was born in 2008 from a US pharmacovigilance project (FDA/FNIH) and evolved into a global standard carried by the OHDSI community since 2014.
- OHDSI brings together 4,700+ collaborators across 88 countries, with 974 million+ standardized patient records.
- The federated model ensures data never leaves the hospital — only code and aggregated results are shared.
- Institutional adoption is accelerating: EMA (DARWIN EU), NHS, NIH (All of Us), Health Data Hub, EHDS.
- The open source tool ecosystem (ATLAS, ACHILLES, ATHENA, HADES) makes OMOP accessible to all.