Summary
OMOP CDM is a relational database model designed to make health data queryable in the same way across every hospital and every database that adopts it. It contains around forty tables organized into six broad categories — clinical, health system, vocabularies, health economics, derived elements, metadata — all built around a single principle: every clinical event is tied to a person and a date. Version 5.4, stable since 2021, is the reference version in the OHDSI community today.
What you already know
The previous articles in this section laid out the why of OMOP (the original project, its community, the idea of a shared format). This article opens the how: what the model actually looks like when you examine its structure.
To follow, you just need a few notions covered in the Understanding health data warehouses section:
- A relational database is a collection of tables linked together by keys — shared columns whose value identifies the same object from one table to the next (for example
person_id, which appears in every event table to tie each row back to a unique patient in thePERSONtable). That’s exactly the logic described in the article How health data is organized. - Health data is structured in long format: one row per event, columns to qualify that event.
- Medical codes (ICD-10, SNOMED CT, LOINC…) are kept in a standardized form to enable comparability across databases.
OMOP CDM applies these principles rigorously: a precisely-specified relational schema, identical from one institution to the next. That schema is what we detail from here on.
A brief history of versions
The Common Data Model isn’t frozen: it evolves through versions published by OHDSI on GitHub. Knowing the history helps you understand the version you’re using — and the ones you should avoid.
CDM v5.3 — January 2019
A stable version for two and a half years, which served as the foundation for many OMOP warehouses built during that period. Still found in databases that haven’t migrated to 5.4.
CDM v5.4 — September 2021
CurrentAdds the EPISODE and EPISODE_EVENT tables, brings COHORT into the model, and adds several fields to existing tables (latitude/longitude in LOCATION, etc.). The full list of changes is documented here. This is today’s reference version — stable for over four years, which is one of OMOP’s strengths.
CDM v6.0 — October 2021, paused
This release made datetime fields (with a precise time) mandatory where they had previously been optional. The official README states that v6.0 “is not fully supported by the OHDSI suite of tools and methods” and explicitly asks institutions to stay on v5.4 until the community has settled the role of dates vs datetimes.
CDM v5.5 — expected August 2026
An incremental evolution is in the works, tracked publicly in the official GitHub milestone, due August 2026. On the menu, among others: a new PACK_CONTENT table that describes the composition of drug packs (which active ingredients, in what quantity, in what packaging), a VALUE_AS_DATE field in OBSERVATION, and several foreign-key fixes. We’ll update this article when the version ships.
The broad table categories
The CDM v5.4 organizes its ~40 tables into six broad categories. Not all are used in every database: most hospital warehouses focus on clinical data and the health system.
Standardized clinical data
The patients themselves, their hospitalizations, diagnoses, medications, lab tests and medical procedures. This is the core of the model — what’s observed about patients.
Standardized vocabularies
The medical concepts dictionary: codes, labels, relationships between concepts, and hierarchies. Covered in the next article.
Standardized health system
The care context: the hospital units, healthcare professionals, and geographical locations where care took place.
Standardized derived elements
Pre-computed aggregates from clinical data, built to speed up recurrent analyses (for example, collapsing several successive prescriptions into a single continuous exposure period).
Standardized health economics
Costs and reimbursements associated with care, mostly populated in claims databases.
Standardized metadata
Documentation of the warehouse itself: which CDM version is used, what scope it covers, and technical information about the instance.
The interactive explorer
To see the details of each table (columns, data types, primary and foreign keys, constraints), use our interactive OMOP v5.4 schema explorer. It covers all tables in the model with cross-links between them.
Three fundamental conventions
Before diving into the tables one by one, three principles run through the whole model. Understanding them will save you a lot of time in practice.
1. Person-centric model
Every record in every event table is tied to a patient identifier (person_id) and at least one date. This is what guarantees that any analysis can produce a longitudinal view of a patient’s health.
The domains are modeled in a person-centric relational data model, where for each record the identity of the person and a date is captured as a minimum.
2. Source / standard duality
This is probably the most important OMOP convention to understand. For every clinical event, the model keeps two parallel representations:
- The original source code as it exists in the hospital system (for example an ICD-10 code for a diagnosis).
- The standardized OMOP concept that the source code has been mapped to (for example the equivalent SNOMED CT concept).
Concretely, every event table follows the same column pattern:
- <event>_concept_id : standard OMOP concept identifier
- <event>_source_value : raw source code/label
- <event>_source_concept_id : OMOP concept matching the source code
A concrete example — a heart rate in MEASUREMENT
Suppose an ICU bedside monitor records heart rate under the local label “HR bpm”, with no structured code. Here’s what the row looks like after the OMOP ETL:
| Column | Value | Reading |
|---|---|---|
| measurement_concept_id | 3027018 | The OMOP standard concept for Heart rate (mapped to LOINC 8867-4). |
| measurement_source_value | HR bpm | The label as it appeared in the source system, kept for traceability. |
| measurement_source_concept_id | 0 | ”HR bpm” isn’t a recognized structured code: no source concept to attach, hence 0. |
If the source had used the LOINC code 8867-4 directly, then measurement_source_concept_id would also have been 3027018: LOINC being a standard OMOP vocabulary, the source code and the standard concept coincide.
Why this dual representation? Because it gives both international comparability (via the standard concept) and traceability (by keeping the original source code you can always verify the data at the source).
Concept_id = 0, OMOP's 'NULL'
When a source code has no mapping to a standard concept, its <event>_concept_id is set to 0. It’s OMOP’s “flavor of NULL” — it flags an unstandardized concept without needing classic SQL NULL filtering. Quality checks routinely start by counting concept_id = 0.
3. type_concept_id: where does this record come from?
Many tables include a <event>_type_concept_id column. Watch out for the trap: this is not a categorization of the event itself, but an indication of its provenance.
For example, for a drug in DRUG_EXPOSURE, the drug_type_concept_id tells you whether the data comes from a prescription, a dispensing or an administration — not the nature of the drug.
This is useful for quality: you may want to restrict an analysis to drugs actually dispensed or administered, not merely prescribed.
With those three conventions in mind, let’s now look at the main tables of the model. We group them into two families: first the ones describing the patient and their journey through the healthcare system, then the ones describing the clinical events observed along that journey.
The patient and their journey
PERSON — the patients table
- Role:
- Patient demographics (sex, year of birth, origin).
- One row per:
- Unique patient in the database.
- Connected to:
- Every clinical table via
person_id. It’s the anchor of the whole model.
Everything starts with PERSON. It’s the demographics table: one row per unique patient in the database.
| Column | Content |
|---|---|
person_id | Unique patient identifier (primary key) |
gender_concept_id | Sex as an OMOP standard concept |
year_of_birth | Year of birth (only mandatory date-of-birth field) |
month_of_birth, day_of_birth | Optional, depending on source availability |
race_concept_id, ethnicity_concept_id | Standardized demographic categories |
location_id, provider_id, care_site_id | Links to context tables |
Note the nuance between year_of_birth (mandatory) and month_of_birth/day_of_birth (optional). The model accepts a partially-known birth date — common in claims databases.
Table preview
Three distinct patients, each with their own person_id. Two women and one man, born between 1958 and 1985. For patient 1042, the full date of birth is known (12 April 1958); for the others, only the year is recorded.
| person_id | gender_concept_id | year_of_birth | month_of_birth | day_of_birth |
|---|---|---|---|---|
| 1042 | 8532 (Female) | 1958 | 4 | 12 |
| 1043 | 8507 (Male) | 1972 | NULL | NULL |
| 1044 | 8532 (Female) | 1985 | NULL | NULL |
VISIT_OCCURRENCE — encounters with the health system
- Role:
- Describes each encounter between the patient and the healthcare system (hospitalization, outpatient visit, emergency…).
- One row per:
- Visit (a given patient usually has several over time).
- Connected to:
PERSON,CARE_SITE, and every clinical event table referencing it viavisit_occurrence_id.
Every encounter between a patient and the healthcare system (hospitalization, outpatient visit, emergency…) generates a row in VISIT_OCCURRENCE.
| Column | Content |
|---|---|
visit_occurrence_id | Unique visit identifier |
person_id | The patient |
visit_concept_id | Visit type (inpatient, emergency, outpatient…) |
visit_start_date, visit_end_date | Time boundaries |
care_site_id | Care unit or facility |
visit_source_value | Source label for the visit |
Table preview
Three visits for two patients. Patient 1042 was hospitalized for a week in March, then seen as an outpatient two months later. Patient 1043 visited the ER without being admitted (a single-day visit).
| visit_occurrence_id | person_id | visit_concept_id | visit_start_date | visit_end_date | care_site_id |
|---|---|---|---|---|---|
| 78201 | 1042 | 9201 (Inpatient Visit) | 2022-03-14 | 2022-03-21 | 12 |
| 78202 | 1042 | 9202 (Outpatient Visit) | 2022-05-02 | 2022-05-02 | 47 |
| 78203 | 1043 | 9203 (Emergency Room Visit) | 2023-11-08 | 2023-11-08 | 8 |
VISIT_DETAIL — unit-level stays within a hospitalization
- Role:
- Breaks a hospitalization down by unit: emergency, ICU, medical ward, surgery…
- One row per:
- Stay in a unit (a long hospitalization generates several).
- Connected to:
VISIT_OCCURRENCE(the parent stay),PERSON, andCARE_SITE.
During a hospitalization, a patient often moves through several units: emergency on arrival, ICU, then a medical ward for the remainder of the stay. VISIT_DETAIL captures these intermediate steps, each row pointing to the parent stay’s visit_occurrence_id — so you keep the overall view while still being able to analyse each unit-level stay.
| Column | Content |
|---|---|
visit_detail_id | Unique sub-stay identifier |
visit_occurrence_id | Link to the parent visit |
person_id | The patient |
visit_detail_concept_id | Sub-stay type (ICU, generic inpatient…) |
visit_detail_start_date, visit_detail_end_date | Time boundaries |
care_site_id | Specific unit (medical ward, surgery, cardiology…) |
parent_visit_detail_id | To model hierarchical sub-stays |
Table preview — visit 78201 broken down
Visit 78201 for patient 1042 (a week-long hospitalization) is broken into three sub-stays: emergency on admission day, three days in ICU, then four days on a medical ward. All three rows point to the same visit_occurrence_id.
Heads up: OMOP standardizes few hospital units — only ICU has its own concept (32037 Intensive Care). For other inpatient stays, you fall back to the generic 9201 Inpatient Visit, and the precise ward is carried by care_site_id.
| visit_detail_id | visit_occurrence_id | person_id | visit_detail_concept_id | visit_detail_start_date | visit_detail_end_date | care_site_id |
|---|---|---|---|---|---|---|
| 91001 | 78201 | 1042 | 9203 (Emergency Room Visit) | 2022-03-14 | 2022-03-14 | 8 |
| 91002 | 78201 | 1042 | 32037 (ICU) | 2022-03-14 | 2022-03-17 | 23 |
| 91003 | 78201 | 1042 | 9201 (Inpatient Visit) | 2022-03-17 | 2022-03-21 | 12 |
Clinical events
Systematic links back to the patient's journey
Every clinical event table that follows (MEASUREMENT, OBSERVATION, CONDITION_OCCURRENCE, PROCEDURE_OCCURRENCE, DRUG_EXPOSURE) carries the same foreign keys to the patient’s journey: person_id (to PERSON), visit_occurrence_id (to VISIT_OCCURRENCE) and visit_detail_id (to VISIT_DETAIL). That’s what lets you go, in a few joins, from any event back to its patient, its stay, and the unit where it took place.
MEASUREMENT — quantifiable values
- Role:
- Stores quantifiable clinical values: lab results, vital signs, clinical scores.
- One row per:
- Recorded measurement (the same parameter is measured many times during a stay).
- Connected to:
PERSON,VISIT_OCCURRENCE,VISIT_DETAIL, and LOINC on the standard side.
MEASUREMENT holds everything that’s measured with a value — number, interval, or coded categorical result.
| Column | Content |
|---|---|
measurement_id | Unique measurement identifier |
person_id | The patient |
measurement_concept_id | Standard concept (most often LOINC) |
measurement_date, measurement_datetime | Date (and time if available) of the measurement |
value_as_number | Numeric value |
value_as_concept_id | Coded value (for qualitative results: positive/negative, etc.) |
unit_concept_id | Unit of the value |
range_low, range_high | Reference range bounds |
visit_occurrence_id, visit_detail_id | Context of the measurement |
measurement_source_value | Source label |
Table preview
Three measurements taken for patient 1042 during her hospitalization: a heart rate (88 bpm), a low-grade fever (38.6 °C), and a serum urea nitrogen. Each row pairs a LOINC concept with its numeric value and unit.
| measurement_id | person_id | measurement_concept_id | measurement_date | value_as_number | unit_concept_id |
|---|---|---|---|---|---|
| 220841 | 1042 | 3027018 (Heart rate) | 2022-03-14 | 88 | 8541 (/min) |
| 220842 | 1042 | 3020891 (Body temperature) | 2022-03-14 | 38.6 | 586323 (°C) |
| 220843 | 1042 | 3013682 (Urea nitrogen) | 2022-03-15 | 7.2 | 8753 (mmol/L) |
OBSERVATION — clinical facts without a numeric value
- Role:
- Stores clinical facts without a natural numeric value: clinical signs, lifestyle, reason for visit.
- One row per:
- Observed clinical fact for a patient.
- Connected to:
PERSON,VISIT_OCCURRENCE,VISIT_DETAIL, and most often SNOMED CT.
OBSERVATION is where clinical events that don’t fit any other domain table land: smoking status, occupation, reason for visit, family history, qualitative score, environmental exposure… When a concept doesn’t belong to the Condition, Drug, Procedure or Measurement domains, this is where it ends up.
| Column | Content |
|---|---|
observation_id | Unique observation identifier |
person_id | The patient |
observation_concept_id | Standard concept (most often SNOMED CT) |
observation_date, observation_datetime | Date (and time if available) of the observation |
value_as_number | Numeric value if relevant |
value_as_string | Text value (free-text answer) |
value_as_concept_id | Coded value |
qualifier_concept_id | Qualifier for the observation |
visit_occurrence_id, visit_detail_id | Context |
observation_source_value | Source label |
A frequent question: where does a given piece of data go — MEASUREMENT or OBSERVATION? The official rule is that the concept’s domain determines the table — a Measurement-domain concept goes to MEASUREMENT, an Observation-domain concept goes to OBSERVATION. ATHENA tells you the domain of each concept.
Table preview
Three clinical facts without a numeric value: a smoking status for patient 1042 (Yes), a more nuanced one for patient 1043 (former smoker), and patient 1044’s reason for visit (check-up). The value_as_string column stores the free-text answer here.
| observation_id | person_id | observation_concept_id | observation_date | value_as_string |
|---|---|---|---|---|
| 404001 | 1042 | 40766306 (Smoked ≥ 100 cig.) | 2022-03-14 | Yes |
| 404002 | 1043 | 4275495 (Tobacco — finding) | 2023-11-08 | Former smoker |
| 404003 | 1044 | 4203722 (Reason for visit) | 2024-02-19 | Routine check-up |
CONDITION_OCCURRENCE — diagnoses
- Role:
- Lists every diagnosis recorded for a patient (principal, comorbidities, problem list…).
- One row per:
- Recorded diagnosis (a visit can generate several rows).
- Connected to:
PERSON,VISIT_OCCURRENCE,VISIT_DETAIL, and the vocabularies (SNOMED CT on the standard side, ICD-10 on the source side).
Every diagnosis given for a patient becomes a row in CONDITION_OCCURRENCE. A single admission can generate multiple rows (a principal diagnosis plus multiple comorbidities).
| Column | Content |
|---|---|
condition_occurrence_id | Unique diagnosis identifier |
person_id | The patient |
condition_concept_id | Standard concept (most often SNOMED CT) |
condition_start_date | Start date |
condition_end_date | End date if known (chronic or resolved) |
visit_occurrence_id | Link to the associated visit |
condition_source_value | Source code (often ICD-10) |
condition_source_concept_id | Concept matching the source code |
condition_type_concept_id | Provenance: principal diagnosis, comorbidity, problem list… |
Table preview
Three diagnoses. Patient 1042 has type 2 diabetes known since 2019 and hypertension diagnosed during her 2022 hospitalization. Patient 1043 had a myocardial infarction in November 2023. Each row keeps the ICD-10 source code alongside the standard SNOMED CT concept.
| condition_occurrence_id | person_id | condition_concept_id | condition_start_date | visit_occurrence_id | condition_source_value |
|---|---|---|---|---|---|
| 601201 | 1042 | 201826 (T2 diabetes) | 2019-06-22 | 78101 | E11.9 |
| 601202 | 1042 | 320128 (Essential HTN) | 2022-03-14 | 78201 | I10 |
| 601203 | 1043 | 4329847 (Myocardial inf.) | 2023-11-08 | 78203 | I21.4 |
PROCEDURE_OCCURRENCE — medical procedures
- Role:
- Lists the medical procedures performed on a patient: surgeries, imaging studies, technical acts, etc.
- One row per:
- Recorded procedure.
- Connected to:
PERSON,VISIT_OCCURRENCE,VISIT_DETAIL, and the vocabularies (SNOMED CT on the standard side, CCAM or CPT-4 on the source side depending on the country).
PROCEDURE_OCCURRENCE documents everything that is done to the patient, as opposed to what is observed about them (CONDITION_OCCURRENCE) or given to them (DRUG_EXPOSURE).
| Column | Content |
|---|---|
procedure_occurrence_id | Unique procedure identifier |
person_id | The patient |
procedure_concept_id | Standard concept (most often SNOMED CT) |
procedure_date, procedure_datetime | Date (and time if available) of the procedure |
procedure_end_date, procedure_end_datetime | End date (and time), for procedures that span time |
visit_occurrence_id, visit_detail_id | Context |
quantity | Number of procedures |
procedure_source_value | Source code (CCAM in France, CPT-4 in the US, ICD-10-PCS…) |
procedure_source_concept_id | Concept matching the source code |
procedure_type_concept_id | Provenance of the record |
Table preview
Three medical procedures. For the first two — patient 1042’s appendectomy and patient 1043’s hemodialysis session — exact start and end times are recorded (datetime fields), letting you compute the actual procedure duration. Patient 1044’s blood draw, much shorter, has no end time recorded. The procedure_source_value keeps the source code (CCAM in France).
| procedure_occurrence_id | person_id | procedure_concept_id | procedure_datetime | procedure_end_datetime | visit_occurrence_id | procedure_source_value |
|---|---|---|---|---|---|---|
| 810501 | 1042 | 4198190 (Appendectomy) | 2022-03-15 09:30 | 2022-03-15 11:00 | 78201 | HHFA002 |
| 810502 | 1043 | 37397396 (Hemodialysis) | 2023-11-09 08:00 | 2023-11-09 12:15 | 78203 | JVJF002 |
| 810503 | 1044 | 4332170 (Venipuncture) | 2024-02-19 09:15 | NULL | 78305 | HSHF002 |
DEVICE_EXPOSURE — medical devices
- Role:
- Records exposure of the patient to a physical object used for diagnostic or therapeutic purposes: implants (stents, prostheses), care equipment (catheters, tubes), consumables (syringes, dressings).
- One row per:
- Device exposure.
- Linked to:
PERSON,VISIT_OCCURRENCE,VISIT_DETAIL, and SNOMED CT on the standard side (Device domain).
DEVICE_EXPOSURE stores what is applied to the patient when it’s a physical object — as opposed to a drug (DRUG_EXPOSURE) or an action (PROCEDURE_OCCURRENCE). The boundary with PROCEDURE_OCCURRENCE can blur: implanting a stent is a procedure, but the stent itself is a device. Both tables often coexist for the same episode of care.
| Column | Contents |
|---|---|
device_exposure_id | Unique identifier of the exposure |
person_id | The patient |
device_concept_id | Standard concept (SNOMED CT, Device domain) |
device_exposure_start_date, device_exposure_end_date | Exposure period |
unique_device_id, production_id | FDA Unique Device Identifier (UDI) if available |
quantity | Number of devices used |
visit_occurrence_id, visit_detail_id | Context |
device_source_value | Source label or code |
device_type_concept_id | Provenance (claims, EHR…) |
Table preview
Three devices. During her ICU stay, patient 1042 had a central venous catheter (in place for three days) and a urinary catheter. Patient 1043 received a coronary stent during his emergency visit for myocardial infarction — an implant, with no end date.
| device_exposure_id | person_id | device_concept_id | device_exposure_start_date | device_exposure_end_date | visit_occurrence_id |
|---|---|---|---|---|---|
| 115001 | 1042 | 4160023 (Central venous catheter) | 2022-03-14 | 2022-03-17 | 78201 |
| 115002 | 1042 | 4047968 (Urinary catheter) | 2022-03-14 | 2022-03-16 | 78201 |
| 115003 | 1043 | 4252356 (Coronary artery stent) | 2023-11-08 | NULL | 78203 |
NOTE — free-text clinical documents
- Role:
- Captures the free-text documents produced by clinicians: discharge summaries, consultation letters, clinical observations, operative reports…
- One row per:
- Text document produced for a patient.
- Linked to:
PERSON,VISIT_OCCURRENCE,VISIT_DETAIL,PROVIDER(the author), and LOINC (Document Ontology) on the standard side.
NOTE holds the unstructured data: everything the other tables don’t capture because it’s free-form writing. It’s the raw material for NLP, whose results land in a dedicated table NOTE_NLP.
| Column | Contents |
|---|---|
note_id | Unique identifier of the document |
person_id | The patient |
note_date, note_datetime | Date (and time if known) the note was written |
note_class_concept_id | LOINC document-type classification (admission note, discharge summary…) |
note_title | Document title |
note_text | Full document content (text) |
encoding_concept_id | Character encoding (typically UTF-8) |
language_concept_id | Document language |
provider_id | Author of the document |
visit_occurrence_id, visit_detail_id | Context |
note_type_concept_id | Provenance (EHR, dictation, import…) |
Table preview
Three notes produced during patient 1042’s hospitalization: an admission note written in the emergency department, an operative report after the appendectomy, and a discharge summary. The full content sits in note_text (truncated here for display). Each row carries its LOINC classification, which lets you filter by document type — for instance, all discharge summaries from a given unit.
| note_id | person_id | note_date | note_class_concept_id | note_title | visit_occurrence_id |
|---|---|---|---|---|---|
| 700101 | 1042 | 2022-03-14 | 706470 (Admission note) | Admission note | 78201 |
| 700102 | 1042 | 2022-03-15 | 706526 (Operative note) | Operative report — appendectomy | 78201 |
| 700103 | 1042 | 2022-03-21 | 706473 (Discharge summary) | Discharge summary | 78201 |
Working with free text
Raw note_text is rarely usable as-is for analysis: it needs to be structured. That’s the role of NLP — extraction of clinical entities, negations, relations — whose results go in NOTE_NLP, one row per extracted term.
DRUG_EXPOSURE — medications
- Role:
- Covers every drug exposure: prescriptions, pharmacy dispensings, in-hospital administrations.
- One row per:
- Drug exposure (a chronic treatment generates several rows over time).
- Connected to:
PERSON,VISIT_OCCURRENCE,VISIT_DETAIL, and RxNorm on the standard side.
DRUG_EXPOSURE covers all drug exposures: prescriptions, dispensings, in-hospital administrations. drug_type_concept_id distinguishes these provenances.
| Column | Content |
|---|---|
drug_exposure_id | Unique identifier |
person_id | The patient |
drug_concept_id | Standard concept (most often RxNorm) |
drug_exposure_start_date, drug_exposure_end_date | Exposure period |
quantity, days_supply, dose_unit_source_value | Dosage |
route_concept_id | Route of administration |
drug_source_value | Source label |
drug_type_concept_id | Record type (prescription, dispensing, etc.) |
Table preview
Three drug exposures with different provenances: a celecoxib prescription for patient 1042 at hospital discharge, an in-hospital ibuprofen administration the day after the appendectomy, and a pharmacy dispensing of an analgesic for patient 1043. drug_type_concept_id distinguishes the three cases.
| drug_exposure_id | person_id | drug_concept_id | drug_exposure_start_date | quantity | drug_type_concept_id |
|---|---|---|---|---|---|
| 920301 | 1042 | 1118084 (celecoxib) | 2022-03-15 | 30 | 32838 (prescription) |
| 920302 | 1042 | 19078461 (ibuprofen 200 mg) | 2022-03-16 | 1 | 32818 (administration) |
| 920303 | 1043 | 19133768 (paracetamol+hydrocodone) | 2023-11-10 | 20 | 32825 (dispensing) |
Computing the actual dose received
A quirk of OMOP: the dose taken by the patient is not stored directly in DRUG_EXPOSURE. The quantity column only gives the number of units dispensed (tablets, vials, millilitres…). To go from a number of units to an actual dose in milligrams, you have to join DRUG_EXPOSURE with the DRUG_STRENGTH table, which holds the active ingredient concentration for every standard drug product.
Outpatient example: a treatment over several days
For patient 1042’s prescription of 30 celecoxib 200 mg tablets over 7 days (row 920301 in the preview above):
DRUG_EXPOSURE.quantity=30(tablets dispensed)DRUG_STRENGTH.amount_value=200(mg per tablet for the standard product)DRUG_EXPOSURE.days_supply=7(days of treatment)
→ Total dose: 30 × 200 = 6,000 mg over the period
→ Daily dose: 6,000 ÷ 7 ≈ 857 mg/day (about 1.5 tablets per day)
Inpatient example: an IV infusion with a varying rate
In hospital settings, especially the ICU, you often get one row per administration — even one row per rate change for continuous infusions. This is very different from the outpatient model: here, it’s the drug_exposure_start_datetime and drug_exposure_end_datetime columns that carry the fine-grained timing, accurate to the minute.
Picture patient 1042 during her ICU sub-stay 91002 on IV norepinephrine (concept 37354531, concentration 0.128 mg/mL = 128 µg/mL). Three rate changes over the afternoon:
| drug_exposure_start_datetime | drug_exposure_end_datetime | quantity (mL infused) |
|---|---|---|
| 2022-03-14 13:37 | 2022-03-14 14:01 | 2.0 |
| 2022-03-14 14:01 | 2022-03-14 16:20 | 27.8 |
| 2022-03-14 16:20 | 2022-03-14 18:30 | 17.3 |
Each row represents a period of stable rate (the rate itself isn’t stored as such — you derive it from the infused volume divided by the duration between start_datetime and end_datetime). The dose actually delivered is then reconstructed by multiplying the infused volume (quantity) by the product’s concentration (numerator_value / denominator_value in DRUG_STRENGTH).
→ On the middle row (27.8 mL at 0.128 mg/mL): 3.56 mg of norepinephrine delivered over 2h19, i.e. about 1.5 mg/h.
For an analysis spanning an entire ICU stay, you have to sum all of these rows by patient and by drug, factoring in the duration of each rate.
Which formula for which dose form?
The calculation rule depends on the dose form:
- Tablet, capsule, single-dose vial:
quantity × amount_value - Solution, infusion, suspension:
quantity × numerator_value / denominator_value - Transdermal patch: use the rate (
numerator_valueper time unit) rather than a total quantity
Details in the official OMOP dose calculation docs. We’ll revisit DRUG_STRENGTH in more detail in the next article on vocabularies.
Time to get hands-on
Manipulating the data is by far the best way to make the OMOP model your own. To practise on the tables we’ve just walked through, we’ve put together a series of interactive SQL tutorials, runnable directly in your browser on a real OMOP sample — no setup required:
- Beginner level — first
SELECTs, first joins, counting patients, filtering by diagnosis. - Intermediate level — aggregations, time windows, working with source/standard duality.
- Advanced level — cohorts, complex multi-table queries, dose calculations.
Going further
This article has covered the tables that matter most for clinical practice. The other tables (PROVIDER, CARE_SITE, LOCATION, COST, SPECIMEN, EPISODE, NOTE_NLP…) follow the same principles.
- Our interactive OMOP v5.4 schema explorer gives column-by-column detail for every table.
- Chapter 4 of the Book of OHDSI is the complete official documentation.
- Precise conventions (date formats, constraints, allowed values) are documented in Themis, OHDSI’s official ETL conventions repository.
- OMOP CDM is a relational model of about 40 tables, split into six categories. v5.4 has been stable since 2021.
- Three conventions structure the whole model: person-centric (person_id + date everywhere), source/standard duality (concept_id vs source_value vs source_concept_id), and type_concept_id to trace the provenance of each record.
- concept_id = 0 flags an unmapped source code. It's the simplest quality indicator to watch first.
- PERSON anchors the whole model; VISIT_OCCURRENCE orchestrates encounters with the health system; VISIT_DETAIL captures unit-level stays (ICU, medical ward…).
- Seven clinical event tables, one per domain: CONDITION_OCCURRENCE (diagnoses), DRUG_EXPOSURE (medications), PROCEDURE_OCCURRENCE (procedures), MEASUREMENT (quantifiable values), OBSERVATION (clinical facts without a natural value), DEVICE_EXPOSURE (medical devices) and NOTE (free text).
- The dose actually administered isn't stored in DRUG_EXPOSURE: you have to join with DRUG_STRENGTH.
- Column-by-column detail is in the interactive explorer; precise conventions (dates, constraints, allowed values) live in Themis. And above all: get hands-on with the SQL tutorials to really internalize the model.