Building an OMOP ETL

Summary

Building an OMOP ETL is often perceived as a multi-month effort. The process can be broken down incrementally and integrated into a team’s regular workflow. Most of the work is semantic mapping — producing the SOURCE_TO_CONCEPT_MAP.csv file. Once that mapping is in place, the rest is mostly mechanical: each source row is routed to the right OMOP table based on the standard concept’s domain_id. This article walks through that approach, presents Linkr’s mapping module, and then has you build a mini MIMIC-IV → OMOP ETL step by step by executing the SQL in your browser.

OMOP ETL: an incremental approach

When discovering OMOP, it’s tempting to picture the ETL as a multi-year effort aiming to standardise the entire local dictionary before any data can be used. That view is legitimate — it reflects what several pioneering teams have actually gone through — but it isn’t the only one.

A more incremental approach lets you deliver a first useful OMOP ETL in a few weeks, then expand coverage progressively over successive projects. It fits inside a data-warehouse team’s normal workflow, without upending habits. It rests on two principles:

Project by project. For a given study you need a few dozen variables. You only map those; the others wait for another project that will reuse what has already been mapped.
Semantic mapping first, the rest follows mechanically. Once the SOURCE_TO_CONCEPT_MAP table is produced, the ETL becomes mostly routing: you look at each standard concept’s domain_id and it directly tells you which OMOP table the data lands in.

A direct consequence of this approach: the OMOP database you produce won’t necessarily contain every table in the model. You populate the tables that match the concepts actually mapped for the project — typically PERSON, VISIT_OCCURRENCE, plus a few event tables (CONDITION_OCCURRENCE, MEASUREMENT, DRUG_EXPOSURE…). The others (DEVICE_EXPOSURE, SPECIMEN, NOTE…) stay empty until a project needs them. That reduced scope is enough for the current project: the analyses will only rely on the tables actually populated, and that’s what matters to deliver.

A complete ETL remains the goal

If your team’s resources allow it, a complete ETL remains the goal: map the entire local dictionary, populate every OMOP table, aim for high coverage from the first release. What we present here is a pragmatic alternative for teams that don’t have those resources upfront and still want to start producing — extending project after project.

This article follows that incremental path. We first present the semantic part (the most demanding, involving domain experts), then routing by domain_id, and finally walk through an immersive MIMIC-IV example: you execute the SQL yourself in your browser, step by step, to build a mini OMOP database.

Prerequisites

This article assumes you have read the previous OMOP series articles: the OMOP CDM model, OMOP vocabularies and the OHDSI tools ecosystem. We use concepts like concept_id, domain_id, standard_concept, Maps to and concept_id = 0 without redefining them.

Semantic mapping, the heart of the work

Semantic mapping consists of linking each local code (a MIMIC itemid, an internal lab identifier, a facility-specific label) to a standard OMOP concept: a LOINC identifier for a lab measurement, a SNOMED CT for a diagnosis or observation, an RxNorm for a drug. The output is a CSV file: the famous SOURCE_TO_CONCEPT_MAP.

This is the most time-consuming part of the whole ETL. It relies on domain experts: biologists for lab parameters, pharmacists for drugs, clinicians (intensivists, internists, emergency physicians…) for diagnoses, observations and procedures. The quality of the mapping determines the quality of everything that follows — this is where you take the necessary time.

Linkr’s mapping module

To support this work, Linkr offers a dedicated module — Concept Mapping. Creating a mapping project takes a few clicks, by importing a source file (CSV, Parquet…) listing the concepts to map. During that import, Linkr offers a parameter step where you pick the columns matching the local code, the label, the category, the frequency and the JSON metadata:

linkr-v2-b1800b.frama.io

Import settings

Configure file parsing and map columns to concept fields.

Delimiter

Auto-detect

Encoding

UTF-8

Skip rows

Header row

Yes

Column mapping

Map file columns to concept fields. At least the concept name or code is required.

Vocabulary

— None —

#	itemid	label	fluid	category
1	50808	Free Calcium	Blood	Blood Gas
2	50826	Tidal Volume	Blood	Blood Gas
3	50813	Lactate	Blood	Blood Gas
4	52029	% Ionized Calcium	Blood	Blood Gas
5	50801	Alveolar-arterial Gradient	Blood	Blood Gas
6	50810	Hematocrit, Calculated	Blood	Blood Gas
7	50820	pH	Blood	Blood Gas
8	50912	Creatinine	Blood	Chemistry
9	51221	Hematocrit	Blood	Hematology
10	51301	White Blood Cells	Blood	Hematology

Vocab	Concept name	Concept code	Patients	Records
ICCA	creatinine serum	L0021	1,842	9,614
ICCA	lactate blood	L0102	1,203	4,871
ICCA	hemoglobin	L0301	1,688	7,203
ICCA	platelet count	L0411	974	3,120
ICCA	WBC count	L0512	881	2,954

Name	Code	Vocab.	Domain
Creatinine [Mass/volume] in Serum or Plasma	2160-0	LOINC	Measurement
Creatinine [Mass/volume] in Blood	38483-4	LOINC	Measurement
Creatinine [Moles/volume] in Serum or Plasma	14682-9	LOINC	Measurement

Agentic-AI-assisted mapping

Hand-mapping remains slow. To speed up the first pass, Linkr integrates with Claude Code via a dedicated concept-mapping skill. An agent processes source concepts in batches, proposes the best standard candidates for each, and produces a combined score:

linkr-v2-b1800b.frama.io

Concept setsSearchSuggestions

Score	Methods	Vocab	Concept name	Code	Std

92%	95%88%91%	LOINC	Creatinine [Mass/volume] in Serum or Plasma	2160-0	S
81%	83%79%82%	LOINC	Creatinine [Mass/volume] in Blood	38483-4	S
76%	80%71%77%	LOINC	Creatinine [Moles/volume] in Serum or Plasma	14682-9	S
68%	72%64%	SNOMED	Serum creatinine measurement	113075003	S
54%	58%51%	LOINC	Creatinine renal clearance	2164-2	S

8 / 24 results⚙

The Suggestions panel: combined score, coloured dots per method (Syntactic, Semantic, Statistical, AI), weight management via the « Manage suggestions » button.

Four method families contribute to the score:

Syntactic — string similarity between the local label and standard labels (and their synonyms).
Semantic — distance between biomedical embeddings (BioLORD-2023-M), computed offline over the entire OMOP vocabulary.
Statistical (coming) — comparison of value distributions for numeric concepts.
Agentic — an LLM agent reviews the best candidates, applies clinical reasoning (consistent units? right granularity? correct domain?), and proposes an equivalence (exactMatch, closeMatch, broadMatch, narrowMatch) with a written justification.

Each suggestion still has to be validated or corrected explicitly in the editor — the AI prepares the ground, the final call always rests with the reviewer. The Suggestions page details the full mechanism.

Metadata and traceability

A concept mapping is more than a correspondence table. It is also a traceability dossier: who mapped what, when, with what reasoning, with what confidence, and what comments were exchanged during review. These metadata matter for the team that will later analyse the data. Linkr stores them with each mapping and can surface them on the source-concept side — distributions, statistics, properties, temporal charts:

linkr-v2-b1800b.frama.io

Heart_rate

ICCA · M0042 · 12,450 records · 1,842 patients

Statistics

Min	0
P25	65
Median	78
P75	92
Max	245
Mean	82.3
Std	18.7

Boxplot

Distribution

Distribution by sex

Female(50%)

Male(50%)

Trend by year

201920202021202220232024

Hospital units

ICU80%

Step-Down Unit20%

Properties

Unit	bpm
Frequency	Hourly
Completeness	98%
Source	Monitoring

The metadata view: statistics, boxplot, histogram, distributions and properties of the source concept.

The expected JSON structure is documented on the Mapping editor page. It accepts statistics (min/median/max/mean…), distributions (bar, pie or line charts), properties (free key/value pairs) and metadata (scalar metadata). All of it helps the reviewer validate a mapping with full context.

Multi-centre networks

In a research network where several hospitals collaborate on multi-centre projects, each institution can take on the mapping of a subset of concepts and contribute its work to a shared registry. That’s the principle of the Linkr portal: a static portal (deployable to GitLab Pages, GitHub Pages or any static host) that aggregates several mapping workspaces and projects, shared across teams. The Global view assigns custom source_concept_ids (range ≥ 2 billion) per hospital, in disjoint ranges, to avoid collisions when centralising data.

OHDSI tools — the classic alternative

The OHDSI community offers its own tool for this step: Usagi, a Java desktop application. You load your list of local codes, Usagi suggests standard concepts based on textual similarity, and you validate by hand. Its output is compatible with SOURCE_TO_CONCEPT_MAP. Usagi is stable, proven, free, offline — the historical route, and still a perfectly valid one.

Routing by `domain_id`

Once SOURCE_TO_CONCEPT_MAP is produced, the rest of the ETL becomes mostly mechanical. For each source row, you ask:

What OMOP standard concept is associated (via the SOURCE_TO_CONCEPT_MAP join)?
What is its domain_id?

And the domain_id directly determines the target OMOP table:

`domain_id`	Target OMOP table	Examples
`Condition`	`CONDITION_OCCURRENCE`	Diabetes, sepsis, hypertension
`Drug`	`DRUG_EXPOSURE`	Aspirin, heparin, furosemide
`Procedure`	`PROCEDURE_OCCURRENCE`	Colonoscopy, intubation, catheterisation
`Measurement`	`MEASUREMENT`	Creatinine, heart rate, hemoglobin
`Device`	`DEVICE_EXPOSURE`	Catheter, probe, ventilator
`Observation`	`OBSERVATION`	Status, insurance type, language
`Visit`	`VISIT_OCCURRENCE` / `VISIT_DETAIL`	Hospital stay, ED visit, ICU stay

Anything that doesn’t fit any of these tables lands by convention in OBSERVATION — OMOP’s general-purpose landing table for such cases. The implementation logic becomes the same for every event table: extract source rows, join on SOURCE_TO_CONCEPT_MAP, filter by the target’s domain_id, insert into the target table with standard _concept_ids, custom _source_concept_ids and _source_values for traceability.

Building a MIMIC-IV → OMOP ETL, step by step

Time to practice. We’ll build a mini OMOP database from MIMIC-IV together, executing the SQL in your browser (via DuckDB-Wasm). The source data is the public MIMIC-IV demo — 100 patients, their stays, measurements and prescriptions.

Before diving into the SQL, here are the five steps of the approach:

Extract the source concepts

Inventory what has to be mapped: one row per concept, with its frequencies and an info_json of metadata (distributions, units, histograms…).

Retrieve the mapping table

Mapping (manual or AI-assisted in Linkr) produces the source_to_concept_map.csv file that links each source code to its standard concept.

Load the reference vocabularies

Import CONCEPT, CONCEPT_RELATIONSHIP and friends from ATHENA, plus the SOURCE_TO_CONCEPT_MAP produced in step B.

Populate the OMOP tables

First the anchor tables (PERSON, VISIT_OCCURRENCE, VISIT_DETAIL), then a routing CTE on domain_id that fills MEASUREMENT, CONDITION_OCCURRENCE and OBSERVATION. DRUG_EXPOSURE is handled separately.

Check quality

Run ACHILLES and the Data Quality Dashboard, spot anomalies, fix, repeat.

Step A — Extract source concepts

The very first step of any OMOP ETL is not an INSERT: it’s the inventory of what needs to be mapped. For our scope (demographics, visit types, transfers, diagnoses, measurements and drugs), we extract distinct source concepts from MIMIC tables along with their frequencies (rows count, distinct patients) and some metadata (units, categories…) that will help the reviewer later. The extraction logic fits in one SQL query per category:

Here’s a representative excerpt of the extraction SQL for labevents lab parameters. The query computes in one pass the descriptive statistics, the histogram, the sex breakdown, the typical measurement frequency and the temporal distribution — all assembled into a single JSON object that follows the format Linkr expects (the other categories — chartevents, diagnoses_icd, transfers, admissions, prescriptions — follow a very similar pattern, adjusted depending on whether the concept is numeric or categorical):

-- Lab parameters (labevents) with a rich info_json
WITH lab_stats AS (
  SELECT
    le.itemid,
    COUNT(*)                                       AS record_count,
    COUNT(DISTINCT le.subject_id)                  AS patient_count,
    MIN(le.valuenum)                               AS v_min,
    MAX(le.valuenum)                               AS v_max,
    AVG(le.valuenum)                               AS v_mean,
    STDDEV(le.valuenum)                            AS v_sd,
    QUANTILE_CONT(le.valuenum, 0.05)               AS p5,
    QUANTILE_CONT(le.valuenum, 0.25)               AS p25,
    QUANTILE_CONT(le.valuenum, 0.50)               AS p50,
    QUANTILE_CONT(le.valuenum, 0.75)               AS p75,
    QUANTILE_CONT(le.valuenum, 0.95)               AS p95
  FROM labevents le
  WHERE le.itemid IN (50971, 50983, 50912, 50882, 51265, 51222, 51301, 50931)
  GROUP BY le.itemid
)
SELECT
  'MIMIC_LABITEMS'                                 AS vocabulary,
  'Lab test'                                       AS category,
  CAST(le.itemid AS VARCHAR)                       AS concept_code,
  ANY_VALUE(dli.label)                             AS concept_name,
  ANY_VALUE(s.record_count)                        AS record_count,
  ANY_VALUE(s.patient_count)                       AS patient_count,
  -- Compose the JSON metadata expected by Linkr's mapping editor
  json_object(
    'full_name',  'Lab test / ' || ANY_VALUE(dli.label),
    'data_source', 'MIMIC-IV',
    'data_types',  'numeric',
    'unit',        ANY_VALUE(le.valueuom),
    'numeric_data', json_object(
      'min',    ANY_VALUE(s.v_min),
      'p5',     ANY_VALUE(s.p5),
      'p25',    ANY_VALUE(s.p25),
      'median', ANY_VALUE(s.p50),
      'mean',   ANY_VALUE(s.v_mean),
      'p75',    ANY_VALUE(s.p75),
      'p95',    ANY_VALUE(s.p95),
      'max',    ANY_VALUE(s.v_max),
      'sd',     ANY_VALUE(s.v_sd)
    )
    -- + histogram, categorical_data (sex), temporal_distribution
    -- (assembled via similar sub-queries)
  )                                                AS info_json
FROM labevents le
JOIN d_labitems dli ON dli.itemid = le.itemid
JOIN lab_stats s    ON s.itemid   = le.itemid
WHERE le.itemid IN (50971, 50983, 50912, 50882, 51265, 51222, 51301, 50931)
GROUP BY le.itemid;

The resulting info_json column follows the structure documented on the Mapping editor page, which Linkr reads to render the source-concept detail card. For a vital sign like heart rate, the result looks like:

{
  "full_name": "Vital sign / Routine Vital Signs / Heart Rate",
  "data_source": "MIMIC-IV",
  "data_types": "numeric",
  "unit": "bpm",
  "missing_rate": 0.0,
  "numeric_data": {
    "min": 0.0, "p5": 62.0, "p25": 78.0, "median": 90.0, "mean": 91.12,
    "p75": 104.0, "p95": 122.0, "max": 200.0, "sd": 18.69
  },
  "histogram": [
    { "x": 10.0, "count": 3 },    { "x": 30.0, "count": 12 },
    { "x": 50.0, "count": 454 },  { "x": 70.0, "count": 3313 },
    { "x": 90.0, "count": 5555 }, { "x": 110.0, "count": 3685 },
    { "x": 130.0, "count": 764 }, { "x": 150.0, "count": 117 }
  ],
  "categorical_data": [
    { "category": "Female", "count": 6284, "percentage": 45.2 },
    { "category": "Male",   "count": 7629, "percentage": 54.8 }
  ],
  "temporal_distribution": {
    "start_date": "2110-04-11", "end_date": "2201-12-13",
    "by_year": [ { "year": 2110, "percentage": 6.5 }, … ]
  },
  "measurement_frequency": "every 1 hours"
}

The final result is a CSV with one row per source concept — for our scope, 46 concepts in total (2 sexes + 7 admission types + 8 care units + 10 diagnoses + 8 vitals + 8 lab parameters + 3 drugs).

Download source-concepts.csv

To go further, open the Linkr demo and:

Create a new mapping project (see the Mapping projects docs).
Import the source-concepts.csv file above, mapping the columns according to the import settings shown earlier.
Map concept by concept in the editor — by hand, or with AI assistance via the Suggestions page.

For the next steps, we skip this and fetch the already-produced mapping table directly.

Step B — Retrieve the mapping table

We assume the mapping is done: here’s the final file, ready to be loaded into the OMOP database.

Download source_to_concept_map.csv

46 rows, 9 columns total (full reference on the source_to_concept_map table page). Three representative rows, in the schema’s column order:

source_code	source_concept_id	source_vocabulary_id	source_code_description	target_concept_id	target_vocabulary_id	valid_start_date	valid_end_date	invalid_reason
F	2000000001	MIMIC_GENDER	Female	8532	Gender	1970-01-01	2099-12-31	NULL
220045	2000000028	MIMIC_CHARTEVENTS	Heart Rate	3027018	LOINC	1970-01-01	2099-12-31	NULL
Aspirin	2000000044	MIMIC_DRUG	Aspirin	1112807	RxNorm	1970-01-01	2099-12-31	NULL

Note the source_concept_ids in the custom OMOP range (≥ 2 billion) and the target_concept_ids pointing at standard concepts.

Step C — Load the reference vocabularies into the database

Before writing the business INSERTs, the target OMOP database has to hold the dictionary of standard concepts — typically the tables CONCEPT, CONCEPT_RELATIONSHIP, CONCEPT_SYNONYM, CONCEPT_ANCESTOR, VOCABULARY, DOMAIN. We fetch them from ATHENA (free account required), ticking the vocabularies useful to our scope (SNOMED CT, LOINC, RxNorm, ICD10CM, UCUM…). The download returns a ZIP of CSV files to load via a COPY or an INSERT INTO ... SELECT FROM read_csv(...).

Once those vocabularies are in place, we also load our source_to_concept_map.csv into the table of the same name, and we add the custom concepts (the source_concept_ids ≥ 2 billion from our STCM) into the CONCEPT table — that’s what allows our _source_concept_id columns on event rows to point to valid entries in the database.

Tutorial prerequisites

To keep the demo lightweight, the interactive component below does not load the full CONCEPT table: it only prepares the source_to_concept_map table with our 46 mappings and creates the empty OMOP target tables. On a real OMOP database, Step C is mandatory — without it, neither ACHILLES nor the DQD can run properly, and queries that use the vocabulary to explore hierarchies (e.g. CONCEPT_ANCESTOR) will return nothing.

Step D — Populate the OMOP tables

Now the ETL proper. The MIMIC raw data and the mapping table are already loaded into a DuckDB database running in your browser. The OMOP target tables (PERSON, VISIT_OCCURRENCE, VISIT_DETAIL, MEASUREMENT, CONDITION_OCCURRENCE, OBSERVATION, DRUG_EXPOSURE) are created empty with the v5.4 schema.

Anchor tables

Three tables are populated first — each with its own INSERT:

PERSON from patients (gender, year of birth derived from anchor_year - anchor_age).
VISIT_OCCURRENCE from admissions (one stay = one hadm_id).
VISIT_DETAIL from transfers (one care-unit transfer = one transfer_id).

Routing by domain_id

On MIMIC, source vocabularies are partitioned by domain (MIMIC_CHARTEVENTS → Measurement, MIMIC_DRUG → Drug, MIMIC_ICD10 → Condition…) — you could in theory write one INSERT per vocabulary, relying on that partition.

However, to stay general and match the approach described earlier, we prepare a materialised CTE mapped_events that unifies the event sources (diagnoses, vital signs, labs) and resolves each row’s target domain_id via CONCEPT. We then filter this CTE by target_domain_id to dispatch into the right OMOP table:

target_domain_id = 'Measurement' → MEASUREMENT
target_domain_id = 'Condition' → CONDITION_OCCURRENCE
Everything else → OBSERVATION (the default fallback table)

PROCEDURE_OCCURRENCE, DEVICE_EXPOSURE, SPECIMEN, NOTE… follow exactly the same logic — just add a WHERE target_domain_id = '…' branch; we leave them out here because no concept in our scope maps to them.

DRUG_EXPOSURE is handled separately: its sources and specifics (dose, route, unit) don’t fit the same generic router.

Heterogeneous source vocabularies

On messier databases — typically an in-house hospital vocabulary that lumps lab tests, clinical observations, scores, even drugs into the same code table — per-vocabulary partitioning no longer works. The mapped_events CTE described here, which resolves the domain_id via CONCEPT, then becomes essential: it’s the only way to route each row correctly. That’s why we adopt this approach from the start — at the cost of one extra join — rather than an INSERT per vocabulary.

Each OMOP table has its own section below, with a ready-to-run INSERT INTO query followed by a SELECT that previews the result. To run a query, click Run (or use the ⌘ / Ctrl + Enter shortcut). The Reset query button above each editor restores the original version in one click. Run the queries in order: mapped_events must exist before the routing INSERTs, and PERSON / VISIT_OCCURRENCE must exist before DRUG_EXPOSURE.

The pattern is always the same: JOIN on source_to_concept_map, INSERT into the target table, COALESCE(..., 0) for unmapped codes so as not to drop rows, and the _source_value and _source_concept_id columns preserve traceability back to the original data.

DRUG_EXPOSURE — simplified teaching version

The INSERT into DRUG_EXPOSURE above is intentionally minimal: we extract the quantity from form_val_disp via regex, fill in route_source_value, dose_unit_source_value and drug_source_value, and that’s it. A real implementation computes the actual dose received from amount_value/amount_unit (unit quantity), numerator_value/denominator_value (concentration for solutions), the administration duration and the frequency — and leans on DRUG_STRENGTH (vocabulary side) to resolve product presentations. The formulas are detailed in the OMOP “drug dose” documentation. Worth doing in a real project, kept out here to avoid bloating the tutorial.

Step E — Validate the resulting database’s quality

After the INSERTs are done, the OMOP database exists. But it can contain silent inconsistencies: nonsensical dates, non-standard concepts where standards are expected, null required fields. Before any research use, you have to measure quality.

Two OHDSI tools are designed for this:

ACHILLES — an R package that produces a synthetic characterisation report: number of patients, age distributions, condition prevalence, drug frequencies, lab-measurement distributions. Excellent for taking stock of the database — and quickly spotting obvious anomalies.
Data Quality Dashboard (DQD) — around 4,000 automated checks across three axes: schema conformance, completeness of required fields, plausibility of values. The interactive HTML report lets you prioritise fixes.

In practice DQD almost always flags hundreds of anomalies on a freshly built database. Most are minor (non-standard concepts, missing dates on optional events…), a handful point at real ETL bugs. It’s an iterative cycle: run, fix, re-run, until the quality level is good enough for release.

Empty tables and DQD

DQD explicitly handles unpopulated tables — exactly the scenario of an incremental, project-by-project ETL. If a table exists but is empty, all field-level and concept-level checks for it are flagged as notApplicable (details in the official status documentation). These checks don’t count as violations. You can also proactively exclude tables via the tablesToExclude parameter of executeDqChecks() — by default, DQD already excludes vocabulary tables (CONCEPT, VOCABULARY, etc.).

What’s next?

Once your first OMOP database is in place, the next project will reuse existing mappings and only add a few new concepts. Project after project, coverage of the local dictionary grows steadily — without ever having to align everything at once.

An OMOP ETL doesn't have to be done in one shot: you proceed project by project, mapping a few dozen concepts at a time — just what the project actually needs.
Semantic mapping is the most time-consuming part. It involves domain experts (biologists, pharmacists, clinicians) and produces the SOURCE_TO_CONCEPT_MAP table.
Agentic AI (Claude Code embedded in Linkr, syntactic + semantic + LLM scoring with justified SKOS equivalences) considerably speeds up the first pass — a human then validates each mapping.
Once mapping is in place, routing follows the standard concept's domain_id: Condition → CONDITION_OCCURRENCE, Drug → DRUG_EXPOSURE, Measurement → MEASUREMENT, etc. Anything that doesn't fit elsewhere lands in OBSERVATION.
Every event table follows the same template: extract + JOIN on STCM + INSERT with the columns _concept_id (standard), _source_concept_id (custom, range ≥ 2 billion) and _source_value (original label, for traceability).
Multi-centre networks centralise mappings via linkr-portal: a static portal shared across institutions, with disjoint custom-ID ranges to avoid collisions.
Final validation via ACHILLES and the Data Quality Dashboard is an iterative cycle: run, fix, re-run — until quality is releasable.

Next article : OMOP SQL — Beginner level

OMOP ETL: an incremental approach

Semantic mapping, the heart of the work

Linkr’s mapping module

Import settings

Agentic-AI-assisted mapping

Metadata and traceability

Routing by domain_id

Building a MIMIC-IV → OMOP ETL, step by step

Step A — Extract source concepts

Step B — Retrieve the mapping table

Step C — Load the reference vocabularies into the database

Step D — Populate the OMOP tables

Step E — Validate the resulting database’s quality

What’s next?

Routing by `domain_id`