The OMOP CDM model

Summary

OMOP CDM is a relational database model designed to make health data queryable in the same way across every hospital and every database that adopts it. It contains around forty tables organized into six broad categories — clinical, health system, vocabularies, health economics, derived elements, metadata — all built around a single principle: every clinical event is tied to a person and a date. Version 5.4, stable since 2021, is the reference version in the OHDSI community today.

What you already know

The previous articles in this section laid out the why of OMOP (the original project, its community, the idea of a shared format). This article opens the how: what the model actually looks like when you examine its structure.

To follow, you just need a few notions covered in the Understanding health data warehouses section:

A relational database is a collection of tables linked together by keys — shared columns whose value identifies the same object from one table to the next (for example person_id, which appears in every event table to tie each row back to a unique patient in the PERSON table). That’s exactly the logic described in the article How health data is organized.
Health data is structured in long format: one row per event, columns to qualify that event.
Medical codes (ICD-10, SNOMED CT, LOINC…) are kept in a standardized form to enable comparability across databases.

OMOP CDM applies these principles rigorously: a precisely-specified relational schema, identical from one institution to the next. That schema is what we detail from here on.

A brief history of versions

The Common Data Model isn’t frozen: it evolves through versions published by OHDSI on GitHub. Knowing the history helps you understand the version you’re using — and the ones you should avoid.

5.3

CDM v5.3 — January 2019

A stable version for two and a half years, which served as the foundation for many OMOP warehouses built during that period. Still found in databases that haven’t migrated to 5.4.

5.4

CDM v5.4 — September 2021

Current

Adds the EPISODE and EPISODE_EVENT tables, brings COHORT into the model, and adds several fields to existing tables (latitude/longitude in LOCATION, etc.). The full list of changes is documented here. This is today’s reference version — stable for over four years, which is one of OMOP’s strengths.

6.0

CDM v6.0 — October 2021, paused

This release made datetime fields (with a precise time) mandatory where they had previously been optional. The official README states that v6.0 “is not fully supported by the OHDSI suite of tools and methods” and explicitly asks institutions to stay on v5.4 until the community has settled the role of dates vs datetimes.

5.5

CDM v5.5 — expected August 2026

An incremental evolution is in the works, tracked publicly in the official GitHub milestone, due August 2026. On the menu, among others: a new PACK_CONTENT table that describes the composition of drug packs (which active ingredients, in what quantity, in what packaging), a VALUE_AS_DATE field in OBSERVATION, and several foreign-key fixes. We’ll update this article when the version ships.

The broad table categories

The CDM v5.4 organizes its ~40 tables into six broad categories. Not all are used in every database: most hospital warehouses focus on clinical data and the health system.

Standardized clinical data

The patients themselves, their hospitalizations, diagnoses, medications, lab tests and medical procedures. This is the core of the model — what’s observed about patients.

Standardized vocabularies

The medical concepts dictionary: codes, labels, relationships between concepts, and hierarchies. Covered in the next article.

Standardized health system

The care context: the hospital units, healthcare professionals, and geographical locations where care took place.

Standardized derived elements

Pre-computed aggregates from clinical data, built to speed up recurrent analyses (for example, collapsing several successive prescriptions into a single continuous exposure period).

Standardized health economics

Costs and reimbursements associated with care, mostly populated in claims databases.

Standardized metadata

Documentation of the warehouse itself: which CDM version is used, what scope it covers, and technical information about the instance.

The interactive explorer

To see the details of each table (columns, data types, primary and foreign keys, constraints), use our interactive OMOP v5.4 schema explorer. It covers all tables in the model with cross-links between them.

Three fundamental conventions

Before diving into the tables one by one, three principles run through the whole model. Understanding them will save you a lot of time in practice.

1. Person-centric model

Every record in every event table is tied to a patient identifier (person_id) and at least one date. This is what guarantees that any analysis can produce a longitudinal view of a patient’s health.

The domains are modeled in a person-centric relational data model, where for each record the identity of the person and a date is captured as a minimum.

— Book of OHDSI, section 4.1 Design Principles

2. Source / standard duality

This is probably the most important OMOP convention to understand. For every clinical event, the model keeps two parallel representations:

The original source code as it exists in the hospital system (for example an ICD-10 code for a diagnosis).
The standardized OMOP concept that the source code has been mapped to (for example the equivalent SNOMED CT concept).

Concretely, every event table follows the same column pattern:

- <event>_concept_id        : standard OMOP concept identifier
- <event>_source_value      : raw source code/label
- <event>_source_concept_id : OMOP concept matching the source code

A concrete example — a heart rate in MEASUREMENT

Suppose an ICU bedside monitor records heart rate under the local label “HR bpm”, with no structured code. Here’s what the row looks like after the OMOP ETL:

Column	Value	Reading
measurement_concept_id	3027018	The OMOP standard concept for Heart rate (mapped to LOINC `8867-4`).
measurement_source_value	HR bpm	The label as it appeared in the source system, kept for traceability.
measurement_source_concept_id	0	”HR bpm” isn’t a recognized structured code: no source concept to attach, hence 0.

If the source had used the LOINC code 8867-4 directly, then measurement_source_concept_id would also have been 3027018: LOINC being a standard OMOP vocabulary, the source code and the standard concept coincide.

Why this dual representation? Because it gives both international comparability (via the standard concept) and traceability (by keeping the original source code you can always verify the data at the source).

Concept_id = 0, OMOP's 'NULL'

When a source code has no mapping to a standard concept, its <event>_concept_id is set to 0. It’s OMOP’s “flavor of NULL” — it flags an unstandardized concept without needing classic SQL NULL filtering. Quality checks routinely start by counting concept_id = 0.

3. type_concept_id: where does this record come from?

Many tables include a <event>_type_concept_id column. Watch out for the trap: this is not a categorization of the event itself, but an indication of its provenance.

For example, for a drug in DRUG_EXPOSURE, the drug_type_concept_id tells you whether the data comes from a prescription, a dispensing or an administration — not the nature of the drug.

This is useful for quality: you may want to restrict an analysis to drugs actually dispensed or administered, not merely prescribed.

With those three conventions in mind, let’s now look at the main tables of the model. We group them into two families: first the ones describing the patient and their journey through the healthcare system, then the ones describing the clinical events observed along that journey.

The patient and their journey

PERSON — the patients table

Role:: Patient demographics (sex, year of birth, origin).
One row per:: Unique patient in the database.
Connected to:: Every clinical table via person_id. It’s the anchor of the whole model.

Everything starts with PERSON. It’s the demographics table: one row per unique patient in the database.

Column	Content
`person_id`	Unique patient identifier (primary key)
`gender_concept_id`	Sex as an OMOP standard concept
`year_of_birth`	Year of birth (only mandatory date-of-birth field)
`month_of_birth`, `day_of_birth`	Optional, depending on source availability
`race_concept_id`, `ethnicity_concept_id`	Standardized demographic categories
`location_id`, `provider_id`, `care_site_id`	Links to context tables

Note the nuance between year_of_birth (mandatory) and month_of_birth/day_of_birth (optional). The model accepts a partially-known birth date — common in claims databases.

Table preview

Three distinct patients, each with their own person_id. Two women and one man, born between 1958 and 1985. For patient 1042, the full date of birth is known (12 April 1958); for the others, only the year is recorded.

person_id	gender_concept_id	year_of_birth	month_of_birth	day_of_birth
1042	8532 (Female)	1958	4	12
1043	8507 (Male)	1972	NULL	NULL
1044	8532 (Female)	1985	NULL	NULL

VISIT_OCCURRENCE — encounters with the health system

Role:: Describes each encounter between the patient and the healthcare system (hospitalization, outpatient visit, emergency…).
One row per:: Visit (a given patient usually has several over time).
Connected to:: PERSON, CARE_SITE, and every clinical event table referencing it via visit_occurrence_id.

Every encounter between a patient and the healthcare system (hospitalization, outpatient visit, emergency…) generates a row in VISIT_OCCURRENCE.

Column	Content
`visit_occurrence_id`	Unique visit identifier
`person_id`	The patient
`visit_concept_id`	Visit type (inpatient, emergency, outpatient…)
`visit_start_date`, `visit_end_date`	Time boundaries
`care_site_id`	Care unit or facility
`visit_source_value`	Source label for the visit

Table preview

Three visits for two patients. Patient 1042 was hospitalized for a week in March, then seen as an outpatient two months later. Patient 1043 visited the ER without being admitted (a single-day visit).

visit_occurrence_id	person_id	visit_concept_id	visit_start_date	visit_end_date	care_site_id
78201	1042	9201 (Inpatient Visit)	2022-03-14	2022-03-21	12
78202	1042	9202 (Outpatient Visit)	2022-05-02	2022-05-02	47
78203	1043	9203 (Emergency Room Visit)	2023-11-08	2023-11-08	8

VISIT_DETAIL — unit-level stays within a hospitalization

Role:: Breaks a hospitalization down by unit: emergency, ICU, medical ward, surgery…
One row per:: Stay in a unit (a long hospitalization generates several).
Connected to:: VISIT_OCCURRENCE (the parent stay), PERSON, and CARE_SITE.

During a hospitalization, a patient often moves through several units: emergency on arrival, ICU, then a medical ward for the remainder of the stay. VISIT_DETAIL captures these intermediate steps, each row pointing to the parent stay’s visit_occurrence_id — so you keep the overall view while still being able to analyse each unit-level stay.

Column	Content
`visit_detail_id`	Unique sub-stay identifier
`visit_occurrence_id`	Link to the parent visit
`person_id`	The patient
`visit_detail_concept_id`	Sub-stay type (ICU, generic inpatient…)
`visit_detail_start_date`, `visit_detail_end_date`	Time boundaries
`care_site_id`	Specific unit (medical ward, surgery, cardiology…)
`parent_visit_detail_id`	To model hierarchical sub-stays

Table preview — visit 78201 broken down

Visit 78201 for patient 1042 (a week-long hospitalization) is broken into three sub-stays: emergency on admission day, three days in ICU, then four days on a medical ward. All three rows point to the same visit_occurrence_id.

Heads up: OMOP standardizes few hospital units — only ICU has its own concept (32037 Intensive Care). For other inpatient stays, you fall back to the generic 9201 Inpatient Visit, and the precise ward is carried by care_site_id.

visit_detail_id	visit_occurrence_id	person_id	visit_detail_concept_id	visit_detail_start_date	visit_detail_end_date	care_site_id
91001	78201	1042	9203 (Emergency Room Visit)	2022-03-14	2022-03-14	8
91002	78201	1042	32037 (ICU)	2022-03-14	2022-03-17	23
91003	78201	1042	9201 (Inpatient Visit)	2022-03-17	2022-03-21	12

Clinical events

Systematic links back to the patient's journey

Every clinical event table that follows (MEASUREMENT, OBSERVATION, CONDITION_OCCURRENCE, PROCEDURE_OCCURRENCE, DRUG_EXPOSURE) carries the same foreign keys to the patient’s journey: person_id (to PERSON), visit_occurrence_id (to VISIT_OCCURRENCE) and visit_detail_id (to VISIT_DETAIL). That’s what lets you go, in a few joins, from any event back to its patient, its stay, and the unit where it took place.

MEASUREMENT — quantifiable values

Role:: Stores quantifiable clinical values: lab results, vital signs, clinical scores.
One row per:: Recorded measurement (the same parameter is measured many times during a stay).
Connected to:: PERSON, VISIT_OCCURRENCE, VISIT_DETAIL, and LOINC on the standard side.

MEASUREMENT holds everything that’s measured with a value — number, interval, or coded categorical result.

Column	Content
`measurement_id`	Unique measurement identifier
`person_id`	The patient
`measurement_concept_id`	Standard concept (most often LOINC)
`measurement_date`, `measurement_datetime`	Date (and time if available) of the measurement
`value_as_number`	Numeric value
`value_as_concept_id`	Coded value (for qualitative results: positive/negative, etc.)
`unit_concept_id`	Unit of the value
`range_low`, `range_high`	Reference range bounds
`visit_occurrence_id`, `visit_detail_id`	Context of the measurement
`measurement_source_value`	Source label

Table preview

Three measurements taken for patient 1042 during her hospitalization: a heart rate (88 bpm), a low-grade fever (38.6 °C), and a serum urea nitrogen. Each row pairs a LOINC concept with its numeric value and unit.

measurement_id	person_id	measurement_concept_id	measurement_date	value_as_number	unit_concept_id
220841	1042	3027018 (Heart rate)	2022-03-14	88	8541 (/min)
220842	1042	3020891 (Body temperature)	2022-03-14	38.6	586323 (°C)
220843	1042	3013682 (Urea nitrogen)	2022-03-15	7.2	8753 (mmol/L)

OBSERVATION — clinical facts without a numeric value

Role:: Stores clinical facts without a natural numeric value: clinical signs, lifestyle, reason for visit.
One row per:: Observed clinical fact for a patient.
Connected to:: PERSON, VISIT_OCCURRENCE, VISIT_DETAIL, and most often SNOMED CT.

OBSERVATION is where clinical events that don’t fit any other domain table land: smoking status, occupation, reason for visit, family history, qualitative score, environmental exposure… When a concept doesn’t belong to the Condition, Drug, Procedure or Measurement domains, this is where it ends up.

Column	Content
`observation_id`	Unique observation identifier
`person_id`	The patient
`observation_concept_id`	Standard concept (most often SNOMED CT)
`observation_date`, `observation_datetime`	Date (and time if available) of the observation
`value_as_number`	Numeric value if relevant
`value_as_string`	Text value (free-text answer)
`value_as_concept_id`	Coded value
`qualifier_concept_id`	Qualifier for the observation
`visit_occurrence_id`, `visit_detail_id`	Context
`observation_source_value`	Source label

A frequent question: where does a given piece of data go — MEASUREMENT or OBSERVATION? The official rule is that the concept’s domain determines the table — a Measurement-domain concept goes to MEASUREMENT, an Observation-domain concept goes to OBSERVATION. ATHENA tells you the domain of each concept.

Table preview

Three clinical facts without a numeric value: a smoking status for patient 1042 (Yes), a more nuanced one for patient 1043 (former smoker), and patient 1044’s reason for visit (check-up). The value_as_string column stores the free-text answer here.

observation_id	person_id	observation_concept_id	observation_date	value_as_string
404001	1042	40766306 (Smoked ≥ 100 cig.)	2022-03-14	Yes
404002	1043	4275495 (Tobacco — finding)	2023-11-08	Former smoker
404003	1044	4203722 (Reason for visit)	2024-02-19	Routine check-up

CONDITION_OCCURRENCE — diagnoses

Role:: Lists every diagnosis recorded for a patient (principal, comorbidities, problem list…).
One row per:: Recorded diagnosis (a visit can generate several rows).
Connected to:: PERSON, VISIT_OCCURRENCE, VISIT_DETAIL, and the vocabularies (SNOMED CT on the standard side, ICD-10 on the source side).

Every diagnosis given for a patient becomes a row in CONDITION_OCCURRENCE. A single admission can generate multiple rows (a principal diagnosis plus multiple comorbidities).

Column	Content
`condition_occurrence_id`	Unique diagnosis identifier
`person_id`	The patient
`condition_concept_id`	Standard concept (most often SNOMED CT)
`condition_start_date`	Start date
`condition_end_date`	End date if known (chronic or resolved)
`visit_occurrence_id`	Link to the associated visit
`condition_source_value`	Source code (often ICD-10)
`condition_source_concept_id`	Concept matching the source code
`condition_type_concept_id`	Provenance: principal diagnosis, comorbidity, problem list…

Table preview

Three diagnoses. Patient 1042 has type 2 diabetes known since 2019 and hypertension diagnosed during her 2022 hospitalization. Patient 1043 had a myocardial infarction in November 2023. Each row keeps the ICD-10 source code alongside the standard SNOMED CT concept.

condition_occurrence_id	person_id	condition_concept_id	condition_start_date	visit_occurrence_id	condition_source_value
601201	1042	201826 (T2 diabetes)	2019-06-22	78101	E11.9
601202	1042	320128 (Essential HTN)	2022-03-14	78201	I10
601203	1043	4329847 (Myocardial inf.)	2023-11-08	78203	I21.4

PROCEDURE_OCCURRENCE — medical procedures

Role:: Lists the medical procedures performed on a patient: surgeries, imaging studies, technical acts, etc.
One row per:: Recorded procedure.
Connected to:: PERSON, VISIT_OCCURRENCE, VISIT_DETAIL, and the vocabularies (SNOMED CT on the standard side, CCAM or CPT-4 on the source side depending on the country).

PROCEDURE_OCCURRENCE documents everything that is done to the patient, as opposed to what is observed about them (CONDITION_OCCURRENCE) or given to them (DRUG_EXPOSURE).

Column	Content
`procedure_occurrence_id`	Unique procedure identifier
`person_id`	The patient
`procedure_concept_id`	Standard concept (most often SNOMED CT)
`procedure_date`, `procedure_datetime`	Date (and time if available) of the procedure
`procedure_end_date`, `procedure_end_datetime`	End date (and time), for procedures that span time
`visit_occurrence_id`, `visit_detail_id`	Context
`quantity`	Number of procedures
`procedure_source_value`	Source code (CCAM in France, CPT-4 in the US, ICD-10-PCS…)
`procedure_source_concept_id`	Concept matching the source code
`procedure_type_concept_id`	Provenance of the record

Table preview

Three medical procedures. For the first two — patient 1042’s appendectomy and patient 1043’s hemodialysis session — exact start and end times are recorded (datetime fields), letting you compute the actual procedure duration. Patient 1044’s blood draw, much shorter, has no end time recorded. The procedure_source_value keeps the source code (CCAM in France).

procedure_occurrence_id	person_id	procedure_concept_id	procedure_datetime	procedure_end_datetime	visit_occurrence_id	procedure_source_value
810501	1042	4198190 (Appendectomy)	2022-03-15 09:30	2022-03-15 11:00	78201	HHFA002
810502	1043	37397396 (Hemodialysis)	2023-11-09 08:00	2023-11-09 12:15	78203	JVJF002
810503	1044	4332170 (Venipuncture)	2024-02-19 09:15	NULL	78305	HSHF002

DEVICE_EXPOSURE — medical devices

Role:: Records exposure of the patient to a physical object used for diagnostic or therapeutic purposes: implants (stents, prostheses), care equipment (catheters, tubes), consumables (syringes, dressings).
One row per:: Device exposure.
Linked to:: PERSON, VISIT_OCCURRENCE, VISIT_DETAIL, and SNOMED CT on the standard side (Device domain).

DEVICE_EXPOSURE stores what is applied to the patient when it’s a physical object — as opposed to a drug (DRUG_EXPOSURE) or an action (PROCEDURE_OCCURRENCE). The boundary with PROCEDURE_OCCURRENCE can blur: implanting a stent is a procedure, but the stent itself is a device. Both tables often coexist for the same episode of care.

Column	Contents
`device_exposure_id`	Unique identifier of the exposure
`person_id`	The patient
`device_concept_id`	Standard concept (SNOMED CT, Device domain)
`device_exposure_start_date`, `device_exposure_end_date`	Exposure period
`unique_device_id`, `production_id`	FDA Unique Device Identifier (UDI) if available
`quantity`	Number of devices used
`visit_occurrence_id`, `visit_detail_id`	Context
`device_source_value`	Source label or code
`device_type_concept_id`	Provenance (claims, EHR…)

Table preview

Three devices. During her ICU stay, patient 1042 had a central venous catheter (in place for three days) and a urinary catheter. Patient 1043 received a coronary stent during his emergency visit for myocardial infarction — an implant, with no end date.

device_exposure_id	person_id	device_concept_id	device_exposure_start_date	device_exposure_end_date	visit_occurrence_id
115001	1042	4160023 (Central venous catheter)	2022-03-14	2022-03-17	78201
115002	1042	4047968 (Urinary catheter)	2022-03-14	2022-03-16	78201
115003	1043	4252356 (Coronary artery stent)	2023-11-08	NULL	78203

NOTE — free-text clinical documents

Role:: Captures the free-text documents produced by clinicians: discharge summaries, consultation letters, clinical observations, operative reports…
One row per:: Text document produced for a patient.
Linked to:: PERSON, VISIT_OCCURRENCE, VISIT_DETAIL, PROVIDER (the author), and LOINC (Document Ontology) on the standard side.

NOTE holds the unstructured data: everything the other tables don’t capture because it’s free-form writing. It’s the raw material for NLP, whose results land in a dedicated table NOTE_NLP.

Column	Contents
`note_id`	Unique identifier of the document
`person_id`	The patient
`note_date`, `note_datetime`	Date (and time if known) the note was written
`note_class_concept_id`	LOINC document-type classification (admission note, discharge summary…)
`note_title`	Document title
`note_text`	Full document content (text)
`encoding_concept_id`	Character encoding (typically UTF-8)
`language_concept_id`	Document language
`provider_id`	Author of the document
`visit_occurrence_id`, `visit_detail_id`	Context
`note_type_concept_id`	Provenance (EHR, dictation, import…)

Table preview

Three notes produced during patient 1042’s hospitalization: an admission note written in the emergency department, an operative report after the appendectomy, and a discharge summary. The full content sits in note_text (truncated here for display). Each row carries its LOINC classification, which lets you filter by document type — for instance, all discharge summaries from a given unit.

note_id	person_id	note_date	note_class_concept_id	note_title	visit_occurrence_id
700101	1042	2022-03-14	706470 (Admission note)	Admission note	78201
700102	1042	2022-03-15	706526 (Operative note)	Operative report — appendectomy	78201
700103	1042	2022-03-21	706473 (Discharge summary)	Discharge summary	78201

Working with free text

Raw note_text is rarely usable as-is for analysis: it needs to be structured. That’s the role of NLP — extraction of clinical entities, negations, relations — whose results go in NOTE_NLP, one row per extracted term.

DRUG_EXPOSURE — medications

Role:: Covers every drug exposure: prescriptions, pharmacy dispensings, in-hospital administrations.
One row per:: Drug exposure (a chronic treatment generates several rows over time).
Connected to:: PERSON, VISIT_OCCURRENCE, VISIT_DETAIL, and RxNorm on the standard side.

DRUG_EXPOSURE covers all drug exposures: prescriptions, dispensings, in-hospital administrations. drug_type_concept_id distinguishes these provenances.

Column	Content
`drug_exposure_id`	Unique identifier
`person_id`	The patient
`drug_concept_id`	Standard concept (most often RxNorm)
`drug_exposure_start_date`, `drug_exposure_end_date`	Exposure period
`quantity`, `days_supply`, `dose_unit_source_value`	Dosage
`route_concept_id`	Route of administration
`drug_source_value`	Source label
`drug_type_concept_id`	Record type (prescription, dispensing, etc.)

Table preview

Three drug exposures with different provenances: a celecoxib prescription for patient 1042 at hospital discharge, an in-hospital ibuprofen administration the day after the appendectomy, and a pharmacy dispensing of an analgesic for patient 1043. drug_type_concept_id distinguishes the three cases.

drug_exposure_id	person_id	drug_concept_id	drug_exposure_start_date	quantity	drug_type_concept_id
920301	1042	1118084 (celecoxib)	2022-03-15	30	32838 (prescription)
920302	1042	19078461 (ibuprofen 200 mg)	2022-03-16	1	32818 (administration)
920303	1043	19133768 (paracetamol+hydrocodone)	2023-11-10	20	32825 (dispensing)

Computing the actual dose received

A quirk of OMOP: the dose taken by the patient is not stored directly in DRUG_EXPOSURE. The quantity column only gives the number of units dispensed (tablets, vials, millilitres…). To go from a number of units to an actual dose in milligrams, you have to join DRUG_EXPOSURE with the DRUG_STRENGTH table, which holds the active ingredient concentration for every standard drug product.

Outpatient example: a treatment over several days

For patient 1042’s prescription of 30 celecoxib 200 mg tablets over 7 days (row 920301 in the preview above):

DRUG_EXPOSURE.quantity = 30 (tablets dispensed)
DRUG_STRENGTH.amount_value = 200 (mg per tablet for the standard product)
DRUG_EXPOSURE.days_supply = 7 (days of treatment)

→ Total dose: 30 × 200 = 6,000 mg over the period → Daily dose: 6,000 ÷ 7 ≈ 857 mg/day (about 1.5 tablets per day)

Inpatient example: an IV infusion with a varying rate

In hospital settings, especially the ICU, you often get one row per administration — even one row per rate change for continuous infusions. This is very different from the outpatient model: here, it’s the drug_exposure_start_datetime and drug_exposure_end_datetime columns that carry the fine-grained timing, accurate to the minute.

Picture patient 1042 during her ICU sub-stay 91002 on IV norepinephrine (concept 37354531, concentration 0.128 mg/mL = 128 µg/mL). Three rate changes over the afternoon:

drug_exposure_start_datetime	drug_exposure_end_datetime	quantity (mL infused)
2022-03-14 13:37	2022-03-14 14:01	2.0
2022-03-14 14:01	2022-03-14 16:20	27.8
2022-03-14 16:20	2022-03-14 18:30	17.3

Each row represents a period of stable rate (the rate itself isn’t stored as such — you derive it from the infused volume divided by the duration between start_datetime and end_datetime). The dose actually delivered is then reconstructed by multiplying the infused volume (quantity) by the product’s concentration (numerator_value / denominator_value in DRUG_STRENGTH).

→ On the middle row (27.8 mL at 0.128 mg/mL): 3.56 mg of norepinephrine delivered over 2h19, i.e. about 1.5 mg/h.

For an analysis spanning an entire ICU stay, you have to sum all of these rows by patient and by drug, factoring in the duration of each rate.

Which formula for which dose form?

The calculation rule depends on the dose form:

Tablet, capsule, single-dose vial: quantity × amount_value
Solution, infusion, suspension: quantity × numerator_value / denominator_value
Transdermal patch: use the rate (numerator_value per time unit) rather than a total quantity

Details in the official OMOP dose calculation docs. We’ll revisit DRUG_STRENGTH in more detail in the next article on vocabularies.

Time to get hands-on

Manipulating the data is by far the best way to make the OMOP model your own. To practise on the tables we’ve just walked through, we’ve put together a series of interactive SQL tutorials, runnable directly in your browser on a real OMOP sample — no setup required:

Beginner level — first SELECTs, first joins, counting patients, filtering by diagnosis.
Intermediate level — aggregations, time windows, working with source/standard duality.
Advanced level — cohorts, complex multi-table queries, dose calculations.

Going further

This article has covered the tables that matter most for clinical practice. The other tables (PROVIDER, CARE_SITE, LOCATION, COST, SPECIMEN, EPISODE, NOTE_NLP…) follow the same principles.

Our interactive OMOP v5.4 schema explorer gives column-by-column detail for every table.
Chapter 4 of the Book of OHDSI is the complete official documentation.
Precise conventions (date formats, constraints, allowed values) are documented in Themis, OHDSI’s official ETL conventions repository.

OMOP CDM is a relational model of about 40 tables, split into six categories. v5.4 has been stable since 2021.
Three conventions structure the whole model: person-centric (person_id + date everywhere), source/standard duality (concept_id vs source_value vs source_concept_id), and type_concept_id to trace the provenance of each record.
concept_id = 0 flags an unmapped source code. It's the simplest quality indicator to watch first.
PERSON anchors the whole model; VISIT_OCCURRENCE orchestrates encounters with the health system; VISIT_DETAIL captures unit-level stays (ICU, medical ward…).
Seven clinical event tables, one per domain: CONDITION_OCCURRENCE (diagnoses), DRUG_EXPOSURE (medications), PROCEDURE_OCCURRENCE (procedures), MEASUREMENT (quantifiable values), OBSERVATION (clinical facts without a natural value), DEVICE_EXPOSURE (medical devices) and NOTE (free text).
The dose actually administered isn't stored in DRUG_EXPOSURE: you have to join with DRUG_STRENGTH.
Column-by-column detail is in the interactive explorer; precise conventions (dates, constraints, allowed values) live in Themis. And above all: get hands-on with the SQL tutorials to really internalize the model.