Linkr
Home Resources Tools Documentation Blog Demo
FR
  • Discovering OMOP
  • The OHDSI community
  • The OMOP CDM model
  • OMOP vocabularies
  • The OHDSI tools ecosystem
  • Building an OMOP ETL
  • SQL — Beginner level
  • SQL — Intermediate level
  • SQL — Advanced level
3/9
20 min Boris Delange · 2026/05/14

The OMOP CDM model

A tour of the OMOP CDM v5.4 data model: broad table categories, key conventions, source/standard duality.

Summary

OMOP CDM is a relational database model designed to make health data queryable in the same way across every hospital and every database that adopts it. It contains around forty tables organized into six broad categories — clinical, health system, vocabularies, health economics, derived elements, metadata — all built around a single principle: every clinical event is tied to a person and a date. Version 5.4, stable since 2021, is the reference version in the OHDSI community today.

What you already know

The previous articles in this section laid out the why of OMOP (the original project, its community, the idea of a shared format). This article opens the how: what the model actually looks like when you examine its structure.

To follow, you just need a few notions covered in the Understanding health data warehouses section:

  • A relational database is a collection of tables linked together by keys — shared columns whose value identifies the same object from one table to the next (for example person_id, which appears in every event table to tie each row back to a unique patient in the PERSON table). That’s exactly the logic described in the article How health data is organized.
  • Health data is structured in long format: one row per event, columns to qualify that event.
  • Medical codes (ICD-10, SNOMED CT, LOINC…) are kept in a standardized form to enable comparability across databases.

OMOP CDM applies these principles rigorously: a precisely-specified relational schema, identical from one institution to the next. That schema is what we detail from here on.

A brief history of versions

The Common Data Model isn’t frozen: it evolves through versions published by OHDSI on GitHub. Knowing the history helps you understand the version you’re using — and the ones you should avoid.

5.3

CDM v5.3 — January 2019

A stable version for two and a half years, which served as the foundation for many OMOP warehouses built during that period. Still found in databases that haven’t migrated to 5.4.

5.4

CDM v5.4 — September 2021

Current

Adds the EPISODE and EPISODE_EVENT tables, brings COHORT into the model, and adds several fields to existing tables (latitude/longitude in LOCATION, etc.). The full list of changes is documented here. This is today’s reference version — stable for over four years, which is one of OMOP’s strengths.

6.0

CDM v6.0 — October 2021, paused

This release made datetime fields (with a precise time) mandatory where they had previously been optional. The official README states that v6.0 “is not fully supported by the OHDSI suite of tools and methods” and explicitly asks institutions to stay on v5.4 until the community has settled the role of dates vs datetimes.

5.5

CDM v5.5 — expected August 2026

An incremental evolution is in the works, tracked publicly in the official GitHub milestone, due August 2026. On the menu, among others: a new PACK_CONTENT table that describes the composition of drug packs (which active ingredients, in what quantity, in what packaging), a VALUE_AS_DATE field in OBSERVATION, and several foreign-key fixes. We’ll update this article when the version ships.

The broad table categories

The CDM v5.4 organizes its ~40 tables into six broad categories. Not all are used in every database: most hospital warehouses focus on clinical data and the health system.

Standardized clinical data

The patients themselves, their hospitalizations, diagnoses, medications, lab tests and medical procedures. This is the core of the model — what’s observed about patients.

Standardized vocabularies

The medical concepts dictionary: codes, labels, relationships between concepts, and hierarchies. Covered in the next article.

Standardized health system

The care context: the hospital units, healthcare professionals, and geographical locations where care took place.

Standardized derived elements

Pre-computed aggregates from clinical data, built to speed up recurrent analyses (for example, collapsing several successive prescriptions into a single continuous exposure period).

Standardized health economics

Costs and reimbursements associated with care, mostly populated in claims databases.

Standardized metadata

Documentation of the warehouse itself: which CDM version is used, what scope it covers, and technical information about the instance.

The interactive explorer

To see the details of each table (columns, data types, primary and foreign keys, constraints), use our interactive OMOP v5.4 schema explorer. It covers all tables in the model with cross-links between them.

Three fundamental conventions

Before diving into the tables one by one, three principles run through the whole model. Understanding them will save you a lot of time in practice.

1. Person-centric model

Every record in every event table is tied to a patient identifier (person_id) and at least one date. This is what guarantees that any analysis can produce a longitudinal view of a patient’s health.

The domains are modeled in a person-centric relational data model, where for each record the identity of the person and a date is captured as a minimum.

— Book of OHDSI, section 4.1 Design Principles

2. Source / standard duality

This is probably the most important OMOP convention to understand. For every clinical event, the model keeps two parallel representations:

  • The original source code as it exists in the hospital system (for example an ICD-10 code for a diagnosis).
  • The standardized OMOP concept that the source code has been mapped to (for example the equivalent SNOMED CT concept).

Concretely, every event table follows the same column pattern:

- <event>_concept_id        : standard OMOP concept identifier
- <event>_source_value      : raw source code/label
- <event>_source_concept_id : OMOP concept matching the source code

A concrete example — a heart rate in MEASUREMENT

Suppose an ICU bedside monitor records heart rate under the local label “HR bpm”, with no structured code. Here’s what the row looks like after the OMOP ETL:

ColumnValueReading
measurement_concept_id3027018The OMOP standard concept for Heart rate (mapped to LOINC 8867-4).
measurement_source_valueHR bpmThe label as it appeared in the source system, kept for traceability.
measurement_source_concept_id0”HR bpm” isn’t a recognized structured code: no source concept to attach, hence 0.

If the source had used the LOINC code 8867-4 directly, then measurement_source_concept_id would also have been 3027018: LOINC being a standard OMOP vocabulary, the source code and the standard concept coincide.

Why this dual representation? Because it gives both international comparability (via the standard concept) and traceability (by keeping the original source code you can always verify the data at the source).

Concept_id = 0, OMOP's 'NULL'

When a source code has no mapping to a standard concept, its <event>_concept_id is set to 0. It’s OMOP’s “flavor of NULL” — it flags an unstandardized concept without needing classic SQL NULL filtering. Quality checks routinely start by counting concept_id = 0.

3. type_concept_id: where does this record come from?

Many tables include a <event>_type_concept_id column. Watch out for the trap: this is not a categorization of the event itself, but an indication of its provenance.

For example, for a drug in DRUG_EXPOSURE, the drug_type_concept_id tells you whether the data comes from a prescription, a dispensing or an administration — not the nature of the drug.

This is useful for quality: you may want to restrict an analysis to drugs actually dispensed or administered, not merely prescribed.

With those three conventions in mind, let’s now look at the main tables of the model. We group them into two families: first the ones describing the patient and their journey through the healthcare system, then the ones describing the clinical events observed along that journey.

The patient and their journey

PERSON — the patients table

Role:
Patient demographics (sex, year of birth, origin).
One row per:
Unique patient in the database.
Connected to:
Every clinical table via person_id. It’s the anchor of the whole model.

Everything starts with PERSON. It’s the demographics table: one row per unique patient in the database.

ColumnContent
person_idUnique patient identifier (primary key)
gender_concept_idSex as an OMOP standard concept
year_of_birthYear of birth (only mandatory date-of-birth field)
month_of_birth, day_of_birthOptional, depending on source availability
race_concept_id, ethnicity_concept_idStandardized demographic categories
location_id, provider_id, care_site_idLinks to context tables

Note the nuance between year_of_birth (mandatory) and month_of_birth/day_of_birth (optional). The model accepts a partially-known birth date — common in claims databases.

Table preview

Three distinct patients, each with their own person_id. Two women and one man, born between 1958 and 1985. For patient 1042, the full date of birth is known (12 April 1958); for the others, only the year is recorded.

person_idgender_concept_idyear_of_birthmonth_of_birthday_of_birth
10428532 (Female)1958412
10438507 (Male)1972NULLNULL
10448532 (Female)1985NULLNULL

VISIT_OCCURRENCE — encounters with the health system

Role:
Describes each encounter between the patient and the healthcare system (hospitalization, outpatient visit, emergency…).
One row per:
Visit (a given patient usually has several over time).
Connected to:
PERSON, CARE_SITE, and every clinical event table referencing it via visit_occurrence_id.

Every encounter between a patient and the healthcare system (hospitalization, outpatient visit, emergency…) generates a row in VISIT_OCCURRENCE.

ColumnContent
visit_occurrence_idUnique visit identifier
person_idThe patient
visit_concept_idVisit type (inpatient, emergency, outpatient…)
visit_start_date, visit_end_dateTime boundaries
care_site_idCare unit or facility
visit_source_valueSource label for the visit

Table preview

Three visits for two patients. Patient 1042 was hospitalized for a week in March, then seen as an outpatient two months later. Patient 1043 visited the ER without being admitted (a single-day visit).

visit_occurrence_idperson_idvisit_concept_idvisit_start_datevisit_end_datecare_site_id
7820110429201 (Inpatient Visit)2022-03-142022-03-2112
7820210429202 (Outpatient Visit)2022-05-022022-05-0247
7820310439203 (Emergency Room Visit)2023-11-082023-11-088

VISIT_DETAIL — unit-level stays within a hospitalization

Role:
Breaks a hospitalization down by unit: emergency, ICU, medical ward, surgery…
One row per:
Stay in a unit (a long hospitalization generates several).
Connected to:
VISIT_OCCURRENCE (the parent stay), PERSON, and CARE_SITE.

During a hospitalization, a patient often moves through several units: emergency on arrival, ICU, then a medical ward for the remainder of the stay. VISIT_DETAIL captures these intermediate steps, each row pointing to the parent stay’s visit_occurrence_id — so you keep the overall view while still being able to analyse each unit-level stay.

ColumnContent
visit_detail_idUnique sub-stay identifier
visit_occurrence_idLink to the parent visit
person_idThe patient
visit_detail_concept_idSub-stay type (ICU, generic inpatient…)
visit_detail_start_date, visit_detail_end_dateTime boundaries
care_site_idSpecific unit (medical ward, surgery, cardiology…)
parent_visit_detail_idTo model hierarchical sub-stays

Table preview — visit 78201 broken down

Visit 78201 for patient 1042 (a week-long hospitalization) is broken into three sub-stays: emergency on admission day, three days in ICU, then four days on a medical ward. All three rows point to the same visit_occurrence_id.

Heads up: OMOP standardizes few hospital units — only ICU has its own concept (32037 Intensive Care). For other inpatient stays, you fall back to the generic 9201 Inpatient Visit, and the precise ward is carried by care_site_id.

visit_detail_idvisit_occurrence_idperson_idvisit_detail_concept_idvisit_detail_start_datevisit_detail_end_datecare_site_id
910017820110429203 (Emergency Room Visit)2022-03-142022-03-148
9100278201104232037 (ICU)2022-03-142022-03-1723
910037820110429201 (Inpatient Visit)2022-03-172022-03-2112

Clinical events

Systematic links back to the patient's journey

Every clinical event table that follows (MEASUREMENT, OBSERVATION, CONDITION_OCCURRENCE, PROCEDURE_OCCURRENCE, DRUG_EXPOSURE) carries the same foreign keys to the patient’s journey: person_id (to PERSON), visit_occurrence_id (to VISIT_OCCURRENCE) and visit_detail_id (to VISIT_DETAIL). That’s what lets you go, in a few joins, from any event back to its patient, its stay, and the unit where it took place.

MEASUREMENT — quantifiable values

Role:
Stores quantifiable clinical values: lab results, vital signs, clinical scores.
One row per:
Recorded measurement (the same parameter is measured many times during a stay).
Connected to:
PERSON, VISIT_OCCURRENCE, VISIT_DETAIL, and LOINC on the standard side.

MEASUREMENT holds everything that’s measured with a value — number, interval, or coded categorical result.

ColumnContent
measurement_idUnique measurement identifier
person_idThe patient
measurement_concept_idStandard concept (most often LOINC)
measurement_date, measurement_datetimeDate (and time if available) of the measurement
value_as_numberNumeric value
value_as_concept_idCoded value (for qualitative results: positive/negative, etc.)
unit_concept_idUnit of the value
range_low, range_highReference range bounds
visit_occurrence_id, visit_detail_idContext of the measurement
measurement_source_valueSource label

Table preview

Three measurements taken for patient 1042 during her hospitalization: a heart rate (88 bpm), a low-grade fever (38.6 °C), and a serum urea nitrogen. Each row pairs a LOINC concept with its numeric value and unit.

measurement_idperson_idmeasurement_concept_idmeasurement_datevalue_as_numberunit_concept_id
22084110423027018 (Heart rate)2022-03-14888541 (/min)
22084210423020891 (Body temperature)2022-03-1438.6586323 (°C)
22084310423013682 (Urea nitrogen)2022-03-157.28753 (mmol/L)

OBSERVATION — clinical facts without a numeric value

Role:
Stores clinical facts without a natural numeric value: clinical signs, lifestyle, reason for visit.
One row per:
Observed clinical fact for a patient.
Connected to:
PERSON, VISIT_OCCURRENCE, VISIT_DETAIL, and most often SNOMED CT.

OBSERVATION is where clinical events that don’t fit any other domain table land: smoking status, occupation, reason for visit, family history, qualitative score, environmental exposure… When a concept doesn’t belong to the Condition, Drug, Procedure or Measurement domains, this is where it ends up.

ColumnContent
observation_idUnique observation identifier
person_idThe patient
observation_concept_idStandard concept (most often SNOMED CT)
observation_date, observation_datetimeDate (and time if available) of the observation
value_as_numberNumeric value if relevant
value_as_stringText value (free-text answer)
value_as_concept_idCoded value
qualifier_concept_idQualifier for the observation
visit_occurrence_id, visit_detail_idContext
observation_source_valueSource label

A frequent question: where does a given piece of data go — MEASUREMENT or OBSERVATION? The official rule is that the concept’s domain determines the table — a Measurement-domain concept goes to MEASUREMENT, an Observation-domain concept goes to OBSERVATION. ATHENA tells you the domain of each concept.

Table preview

Three clinical facts without a numeric value: a smoking status for patient 1042 (Yes), a more nuanced one for patient 1043 (former smoker), and patient 1044’s reason for visit (check-up). The value_as_string column stores the free-text answer here.

observation_idperson_idobservation_concept_idobservation_datevalue_as_string
404001104240766306 (Smoked ≥ 100 cig.)2022-03-14Yes
40400210434275495 (Tobacco — finding)2023-11-08Former smoker
40400310444203722 (Reason for visit)2024-02-19Routine check-up

CONDITION_OCCURRENCE — diagnoses

Role:
Lists every diagnosis recorded for a patient (principal, comorbidities, problem list…).
One row per:
Recorded diagnosis (a visit can generate several rows).
Connected to:
PERSON, VISIT_OCCURRENCE, VISIT_DETAIL, and the vocabularies (SNOMED CT on the standard side, ICD-10 on the source side).

Every diagnosis given for a patient becomes a row in CONDITION_OCCURRENCE. A single admission can generate multiple rows (a principal diagnosis plus multiple comorbidities).

ColumnContent
condition_occurrence_idUnique diagnosis identifier
person_idThe patient
condition_concept_idStandard concept (most often SNOMED CT)
condition_start_dateStart date
condition_end_dateEnd date if known (chronic or resolved)
visit_occurrence_idLink to the associated visit
condition_source_valueSource code (often ICD-10)
condition_source_concept_idConcept matching the source code
condition_type_concept_idProvenance: principal diagnosis, comorbidity, problem list…

Table preview

Three diagnoses. Patient 1042 has type 2 diabetes known since 2019 and hypertension diagnosed during her 2022 hospitalization. Patient 1043 had a myocardial infarction in November 2023. Each row keeps the ICD-10 source code alongside the standard SNOMED CT concept.

condition_occurrence_idperson_idcondition_concept_idcondition_start_datevisit_occurrence_idcondition_source_value
6012011042201826 (T2 diabetes)2019-06-2278101E11.9
6012021042320128 (Essential HTN)2022-03-1478201I10
60120310434329847 (Myocardial inf.)2023-11-0878203I21.4

PROCEDURE_OCCURRENCE — medical procedures

Role:
Lists the medical procedures performed on a patient: surgeries, imaging studies, technical acts, etc.
One row per:
Recorded procedure.
Connected to:
PERSON, VISIT_OCCURRENCE, VISIT_DETAIL, and the vocabularies (SNOMED CT on the standard side, CCAM or CPT-4 on the source side depending on the country).

PROCEDURE_OCCURRENCE documents everything that is done to the patient, as opposed to what is observed about them (CONDITION_OCCURRENCE) or given to them (DRUG_EXPOSURE).

ColumnContent
procedure_occurrence_idUnique procedure identifier
person_idThe patient
procedure_concept_idStandard concept (most often SNOMED CT)
procedure_date, procedure_datetimeDate (and time if available) of the procedure
procedure_end_date, procedure_end_datetimeEnd date (and time), for procedures that span time
visit_occurrence_id, visit_detail_idContext
quantityNumber of procedures
procedure_source_valueSource code (CCAM in France, CPT-4 in the US, ICD-10-PCS…)
procedure_source_concept_idConcept matching the source code
procedure_type_concept_idProvenance of the record

Table preview

Three medical procedures. For the first two — patient 1042’s appendectomy and patient 1043’s hemodialysis session — exact start and end times are recorded (datetime fields), letting you compute the actual procedure duration. Patient 1044’s blood draw, much shorter, has no end time recorded. The procedure_source_value keeps the source code (CCAM in France).

procedure_occurrence_idperson_idprocedure_concept_idprocedure_datetimeprocedure_end_datetimevisit_occurrence_idprocedure_source_value
81050110424198190 (Appendectomy)2022-03-15 09:302022-03-15 11:0078201HHFA002
810502104337397396 (Hemodialysis)2023-11-09 08:002023-11-09 12:1578203JVJF002
81050310444332170 (Venipuncture)2024-02-19 09:15NULL78305HSHF002

DEVICE_EXPOSURE — medical devices

Role:
Records exposure of the patient to a physical object used for diagnostic or therapeutic purposes: implants (stents, prostheses), care equipment (catheters, tubes), consumables (syringes, dressings).
One row per:
Device exposure.
Linked to:
PERSON, VISIT_OCCURRENCE, VISIT_DETAIL, and SNOMED CT on the standard side (Device domain).

DEVICE_EXPOSURE stores what is applied to the patient when it’s a physical object — as opposed to a drug (DRUG_EXPOSURE) or an action (PROCEDURE_OCCURRENCE). The boundary with PROCEDURE_OCCURRENCE can blur: implanting a stent is a procedure, but the stent itself is a device. Both tables often coexist for the same episode of care.

ColumnContents
device_exposure_idUnique identifier of the exposure
person_idThe patient
device_concept_idStandard concept (SNOMED CT, Device domain)
device_exposure_start_date, device_exposure_end_dateExposure period
unique_device_id, production_idFDA Unique Device Identifier (UDI) if available
quantityNumber of devices used
visit_occurrence_id, visit_detail_idContext
device_source_valueSource label or code
device_type_concept_idProvenance (claims, EHR…)

Table preview

Three devices. During her ICU stay, patient 1042 had a central venous catheter (in place for three days) and a urinary catheter. Patient 1043 received a coronary stent during his emergency visit for myocardial infarction — an implant, with no end date.

device_exposure_idperson_iddevice_concept_iddevice_exposure_start_datedevice_exposure_end_datevisit_occurrence_id
11500110424160023 (Central venous catheter)2022-03-142022-03-1778201
11500210424047968 (Urinary catheter)2022-03-142022-03-1678201
11500310434252356 (Coronary artery stent)2023-11-08NULL78203

NOTE — free-text clinical documents

Role:
Captures the free-text documents produced by clinicians: discharge summaries, consultation letters, clinical observations, operative reports…
One row per:
Text document produced for a patient.
Linked to:
PERSON, VISIT_OCCURRENCE, VISIT_DETAIL, PROVIDER (the author), and LOINC (Document Ontology) on the standard side.

NOTE holds the unstructured data: everything the other tables don’t capture because it’s free-form writing. It’s the raw material for NLP, whose results land in a dedicated table NOTE_NLP.

ColumnContents
note_idUnique identifier of the document
person_idThe patient
note_date, note_datetimeDate (and time if known) the note was written
note_class_concept_idLOINC document-type classification (admission note, discharge summary…)
note_titleDocument title
note_textFull document content (text)
encoding_concept_idCharacter encoding (typically UTF-8)
language_concept_idDocument language
provider_idAuthor of the document
visit_occurrence_id, visit_detail_idContext
note_type_concept_idProvenance (EHR, dictation, import…)

Table preview

Three notes produced during patient 1042’s hospitalization: an admission note written in the emergency department, an operative report after the appendectomy, and a discharge summary. The full content sits in note_text (truncated here for display). Each row carries its LOINC classification, which lets you filter by document type — for instance, all discharge summaries from a given unit.

note_idperson_idnote_datenote_class_concept_idnote_titlevisit_occurrence_id
70010110422022-03-14706470 (Admission note)Admission note78201
70010210422022-03-15706526 (Operative note)Operative report — appendectomy78201
70010310422022-03-21706473 (Discharge summary)Discharge summary78201

Working with free text

Raw note_text is rarely usable as-is for analysis: it needs to be structured. That’s the role of NLP — extraction of clinical entities, negations, relations — whose results go in NOTE_NLP, one row per extracted term.

DRUG_EXPOSURE — medications

Role:
Covers every drug exposure: prescriptions, pharmacy dispensings, in-hospital administrations.
One row per:
Drug exposure (a chronic treatment generates several rows over time).
Connected to:
PERSON, VISIT_OCCURRENCE, VISIT_DETAIL, and RxNorm on the standard side.

DRUG_EXPOSURE covers all drug exposures: prescriptions, dispensings, in-hospital administrations. drug_type_concept_id distinguishes these provenances.

ColumnContent
drug_exposure_idUnique identifier
person_idThe patient
drug_concept_idStandard concept (most often RxNorm)
drug_exposure_start_date, drug_exposure_end_dateExposure period
quantity, days_supply, dose_unit_source_valueDosage
route_concept_idRoute of administration
drug_source_valueSource label
drug_type_concept_idRecord type (prescription, dispensing, etc.)

Table preview

Three drug exposures with different provenances: a celecoxib prescription for patient 1042 at hospital discharge, an in-hospital ibuprofen administration the day after the appendectomy, and a pharmacy dispensing of an analgesic for patient 1043. drug_type_concept_id distinguishes the three cases.

drug_exposure_idperson_iddrug_concept_iddrug_exposure_start_datequantitydrug_type_concept_id
92030110421118084 (celecoxib)2022-03-153032838 (prescription)
920302104219078461 (ibuprofen 200 mg)2022-03-16132818 (administration)
920303104319133768 (paracetamol+hydrocodone)2023-11-102032825 (dispensing)

Computing the actual dose received

A quirk of OMOP: the dose taken by the patient is not stored directly in DRUG_EXPOSURE. The quantity column only gives the number of units dispensed (tablets, vials, millilitres…). To go from a number of units to an actual dose in milligrams, you have to join DRUG_EXPOSURE with the DRUG_STRENGTH table, which holds the active ingredient concentration for every standard drug product.

Outpatient example: a treatment over several days

For patient 1042’s prescription of 30 celecoxib 200 mg tablets over 7 days (row 920301 in the preview above):

  • DRUG_EXPOSURE.quantity = 30 (tablets dispensed)
  • DRUG_STRENGTH.amount_value = 200 (mg per tablet for the standard product)
  • DRUG_EXPOSURE.days_supply = 7 (days of treatment)

→ Total dose: 30 × 200 = 6,000 mg over the period → Daily dose: 6,000 ÷ 7 ≈ 857 mg/day (about 1.5 tablets per day)

Inpatient example: an IV infusion with a varying rate

In hospital settings, especially the ICU, you often get one row per administration — even one row per rate change for continuous infusions. This is very different from the outpatient model: here, it’s the drug_exposure_start_datetime and drug_exposure_end_datetime columns that carry the fine-grained timing, accurate to the minute.

Picture patient 1042 during her ICU sub-stay 91002 on IV norepinephrine (concept 37354531, concentration 0.128 mg/mL = 128 µg/mL). Three rate changes over the afternoon:

drug_exposure_start_datetimedrug_exposure_end_datetimequantity (mL infused)
2022-03-14 13:372022-03-14 14:012.0
2022-03-14 14:012022-03-14 16:2027.8
2022-03-14 16:202022-03-14 18:3017.3

Each row represents a period of stable rate (the rate itself isn’t stored as such — you derive it from the infused volume divided by the duration between start_datetime and end_datetime). The dose actually delivered is then reconstructed by multiplying the infused volume (quantity) by the product’s concentration (numerator_value / denominator_value in DRUG_STRENGTH).

→ On the middle row (27.8 mL at 0.128 mg/mL): 3.56 mg of norepinephrine delivered over 2h19, i.e. about 1.5 mg/h.

For an analysis spanning an entire ICU stay, you have to sum all of these rows by patient and by drug, factoring in the duration of each rate.

Which formula for which dose form?

The calculation rule depends on the dose form:

  • Tablet, capsule, single-dose vial: quantity × amount_value
  • Solution, infusion, suspension: quantity × numerator_value / denominator_value
  • Transdermal patch: use the rate (numerator_value per time unit) rather than a total quantity

Details in the official OMOP dose calculation docs. We’ll revisit DRUG_STRENGTH in more detail in the next article on vocabularies.

Time to get hands-on

Manipulating the data is by far the best way to make the OMOP model your own. To practise on the tables we’ve just walked through, we’ve put together a series of interactive SQL tutorials, runnable directly in your browser on a real OMOP sample — no setup required:

  • Beginner level — first SELECTs, first joins, counting patients, filtering by diagnosis.
  • Intermediate level — aggregations, time windows, working with source/standard duality.
  • Advanced level — cohorts, complex multi-table queries, dose calculations.

Going further

This article has covered the tables that matter most for clinical practice. The other tables (PROVIDER, CARE_SITE, LOCATION, COST, SPECIMEN, EPISODE, NOTE_NLP…) follow the same principles.

  • Our interactive OMOP v5.4 schema explorer gives column-by-column detail for every table.
  • Chapter 4 of the Book of OHDSI is the complete official documentation.
  • Precise conventions (date formats, constraints, allowed values) are documented in Themis, OHDSI’s official ETL conventions repository.
  • OMOP CDM is a relational model of about 40 tables, split into six categories. v5.4 has been stable since 2021.
  • Three conventions structure the whole model: person-centric (person_id + date everywhere), source/standard duality (concept_id vs source_value vs source_concept_id), and type_concept_id to trace the provenance of each record.
  • concept_id = 0 flags an unmapped source code. It's the simplest quality indicator to watch first.
  • PERSON anchors the whole model; VISIT_OCCURRENCE orchestrates encounters with the health system; VISIT_DETAIL captures unit-level stays (ICU, medical ward…).
  • Seven clinical event tables, one per domain: CONDITION_OCCURRENCE (diagnoses), DRUG_EXPOSURE (medications), PROCEDURE_OCCURRENCE (procedures), MEASUREMENT (quantifiable values), OBSERVATION (clinical facts without a natural value), DEVICE_EXPOSURE (medical devices) and NOTE (free text).
  • The dose actually administered isn't stored in DRUG_EXPOSURE: you have to join with DRUG_STRENGTH.
  • Column-by-column detail is in the interactive explorer; precise conventions (dates, constraints, allowed values) live in Themis. And above all: get hands-on with the SQL tutorials to really internalize the model.
PreviousThe OHDSI communityNextOMOP vocabularies

Product

  • Home
  • Demo

Resources

  • Documentation
  • Resources
  • Tools
  • Blog

Community

  • Framagit source code
  • Github source code

About

  • InterHop.org
  • Contact

2021–2026 InterHop — CC BY-NC-SA 4.0 (site) · GPLv3 (software)