Summary
The Suggestions tab shows, for each source concept, OMOP candidates ranked by a combined score (syntactic + semantic + statistical + AI). Scores are not computed inside Linkr: they are produced by the Claude concept-mapping Skill, which runs in Claude Code against a ZIP export of the project. You then validate every mapping manually in the editor.
Why a separate tab
Mapping 1,000 local codes by hand against the 4 million concepts of the OMOP vocabulary is slow. Each algorithm family has its own bias:
- String comparison misses synonyms (“HR” vs “heart rate”).
- Semantic embeddings occasionally confuse close-but-different concepts (“serum creatinine” vs “urine creatinine”).
- AI validation adds clinical context but is expensive to run over 4 million pairs.
The Suggestions tab combines several methods into a single score and surfaces the per-method details, so you keep the final decision.
The workflow
Export the project
ZIP: project.json, mappings.json, source-concepts.csv
concept-mapping Skill
Score computation · mapping proposals · LLM
Suggestions tab
Import the .parquet file · weight · validate
- In Linkr — export the project as a ZIP from the Export tab. The ZIP contains
project.json,mappings.jsonandsource-concepts.csv. - In Claude Code — run the
concept-mappingskill (see below). It computes the scores, drafts mappings via an LLM, and produces asimilarity-scores.parquetfile. - Back in Linkr — open the mapping editor, switch to Suggestions mode, click Manage → Import and load the
.parquet.
The Suggestions tab in Linkr
| Score | Methods | Vocab | Concept name | Code | Std | |
|---|---|---|---|---|---|---|
92% | 95%88%91% | LOINC | Creatinine [Mass/volume] in Serum or Plasma | 2160-0 | S | |
81% | 83%79%82% | LOINC | Creatinine [Mass/volume] in Blood | 38483-4 | S | |
76% | 80%71%77% | LOINC | Creatinine [Moles/volume] in Serum or Plasma | 14682-9 | S | |
68% | 72%64% | SNOMED | Serum creatinine measurement | 113075003 | S | |
54% | 58%51% | LOINC | Creatinine renal clearance | 2164-2 | S |
Similarity methods
Four method families can feed the candidate ranking. Each attacks the problem from a different angle, and their failure modes don’t overlap — which is why combining them beats any single family.
Syntactic
Compares the label character strings.
Semantic
Compares meaning via biomedical embeddings.
Statistical
Compares the distributions of numerical values.
AI
An LLM re-reads top candidates with the clinical context.
In Linkr, every method is represented next to the combined score by a coloured dot (colours above) followed by its percentage. On hover, a tooltip names the exact algorithm. This helps understand why a score is high: a candidate scoring 92% from the syntactic method alone is likely a textual false friend, while one scoring 80% from semantic + AI is much more solid.
1. Syntactic — compare character strings
We compare label strings as-is, without any understanding of their meaning. Useful because source and OMOP labels often follow different writing conventions: case, spaces or underscores, accents, abbreviations, word order, punctuation. Three algorithms in the catalogue:
- Jaro-Winkler — letter-level proximity with a bonus for shared prefixes. Matches “hemoglobin_a1c” with “Hemoglobin A1c” despite the case and separator differences.
- Token-Sort — splits labels into words, sorts them alphabetically, then compares. Word-order agnostic (“blood pressure systolic” ≈ “systolic blood pressure”).
- N-gram IDF — splits into character n-grams (subsequences of N letters), weights each n-gram by its rarity in the vocabulary. Usagi’s historical method.
hemoglobin_a1c(source label, no accents, no spaces)Hemoglobin A1cA1c hemoglobinHemoglobinGlucoseJaro-Winkler — letter-level proximity, bonus for shared prefix; Token-Sort lets "A1c hemoglobin" match the source despite word-order differences.
Strengths — catches near-literal matches despite writing variants (case, spaces, abbreviations, word order), very fast (milliseconds per pair).
Limits — does not account for synonyms: “creatinine” and “creat” are close, but “heart rate” and “pulse” score zero.
2. Semantic — compare meaning
Each label is encoded as a numerical vector through a sentence-embedding model, then we measure the angle between vectors via cosine similarity. Two labels with close meanings will produce close vectors, even if the words differ.
From label to embedding vector
BioLORD-2023-M projects each label into a 384-dimensional space. Here are 24 dimensions as a heatmap — each cell is one dimension, colour encodes the value.
Heart ratePulseBlood pressureCosine similarity
cos(θ) = (A · B) / (‖A‖ × ‖B‖)Heart ratevsPulseHeart ratevsBlood pressureThe patterns for "Heart rate" and "Pulse" are nearly identical cell by cell → very close vectors. "Blood pressure" has a different pattern → distant vector. The closer the cosine score gets to 1, the closer the meanings.
Default model: BioLORD-2023-M, trained on biomedical text and multilingual (French, English, Spanish, German…).
Configurable alternatives — any Sentence-Transformers compatible BERT model:
- PubMedBERT, SapBERT, ClinicalBERT — biomedical-specialised but English-only.
- all-MiniLM-L6-v2 — general-purpose, English-only, much faster.
- paraphrase-multilingual-MiniLM-L12-v2, LaBSE — general-purpose, multilingual.
Multilingual coverage
If your source labels are in French (or any non-English language), stick with BioLORD-2023-M or switch to an explicitly multilingual model. PubMedBERT, SapBERT, ClinicalBERT and all-MiniLM-L6-v2 will produce poor-quality embeddings on non-English text.
Strengths — catches synonyms (“heart rate” ≈ “pulse”), rephrasings, terminology variants.
Limits — can conflate two close-but-distinct concepts (“serum creatinine” vs “urine creatinine”). Expensive to pre-compute over the ~4M OMOP concepts (GPU recommended).
3. Statistical — compare value distributions
For numerical variables (Measurement only), we compare the statistical distribution of values between source concept and candidates: min, P25, median, P75, max, mean, standard deviation. Two variables taking values in the same range with the same shape are probably the same measurement.
Concretely, we use a statistical test to quantify the gap between the two distributions:
- Kolmogorov-Smirnov test (KS) — maximum gap between cumulative distribution functions. KS = 0 → identical distributions; KS close to 1 → very different distributions.
- Wasserstein distance — “cost” of transporting the mass of one distribution onto the other. More robust than KS when supports are shifted (unit change).
Source concept: serum creatinine (µmol/L)
12,450 measurements, 1,842 patients
LOINC 2160-0 — Creatinine [Mass/volume] in Serum or Plasmaunit: mg/dLLOINC 14682-9 — Creatinine [Moles/volume] in Serum or Plasmaunit: µmol/LLOINC 2951-2 — Sodium [Moles/volume] in Serum or Plasmaunit: mmol/LKolmogorov-Smirnov test: KS close to 0 = identical distributions; KS > 0.5 = very different distributions. Smoothed curves (kernel density estimate) let you compare shape and position without the distributions hiding each other.
Strengths — works even when the label is ambiguous or missing. Very discriminating: hard to confuse creatinine (60–120 µmol/L) with sodium (135–145 mmol/L).
Limits — only applicable to numerical variables with enough values for the distribution to be representative. Sensitive to data-entry artifacts (outliers, mixed units).
4. Agentic AI — let an LLM reason about it
An LLM (Claude), driven agentically (the LLM decides which tools to invoke: reading labels, browsing the OMOP hierarchy, computing on source values), re-reads the top N candidates from the other methods with all the context available: source label, statistics, units, frequency, value examples. It proposes one or more mappings carrying a SKOS equivalence (exactMatch, narrowMatch, broadMatch, relatedMatch) and a short clinical justification.
TA_systolique_invasiveICCA · M0118 · "Invasive systolic arterial pressure" · 24,380 measurements · median 122 mmHg
SNOMED 251071003 — Invasive systolic arterial pressure91%SNOMED 72313002 — Systolic arterial pressure78%LOINC 8480-6 — Systolic blood pressure74%
The label "TA_systolique_invasive" refers to systolic arterial pressure measured via an arterial catheter (arterial line), not by cuff. SNOMED 251071003 matches this definition exactly — a direct mapping. LOINC 8480-6 is a broader concept (systolic, any method): the target is more general than the source, which is written broadMatch in SKOS. The observed distribution (median 122 mmHg, right tail up to ~180) is consistent with an invasive ICU measurement.
SNOMED 251071003 — Invasive systolic arterial pressureConfidence: 97%skos:exactMatch — The target concept describes precisely the same measurement.LOINC 8480-6 — Systolic blood pressureConfidence: 88%skos:broadMatch — The target is more general (any measurement method), so broader than the invasive source.Alternatives to Claude Code
The concept-mapping Skill ships for Claude Code, but the underlying idea — an agentic CLI orchestrating Python scripts and writing a .parquet of scores — has nothing Claude-specific. Other agentic CLIs should be able to run an equivalent flow: Mistral Vibe CLI, Codex CLI, Gemini CLI… In practice only Claude Code has been tested so far — porting the Skill to another CLI would require rewriting SKILL.md in the target tool’s format.
On the local-model side, Claude Code can be wired to Ollama to run an open-source LLM (Llama, Mistral, Qwen…) on your own machine instead of hitting the Anthropic API. Upsides: no network calls, no per-token cost, data never leaves your workstation. Downsides: reasoning quality below frontier models on complex clinical cases, and you need the hardware (at least 16 GB of VRAM for a useful model).
Strengths — the only method that “understands” clinical semantics (qualifiers, measurement context, nuanced equivalences). Can say “this is not exact, it’s a sub-type” via narrowMatch.
Limits — slow and expensive (tokenised API calls), risk of hallucinations on rare cases, still requires human validation.
Frugality principle — start with the cheapest method
The methods sit at different resource tiers, roughly grouped as:
- Syntactic and statistical — light CPU, local computations on strings or on values already in memory. Order of magnitude: milliseconds per pair.
- Semantic (BERT / embeddings) — a GPU is recommended for inference and for pre-computing the ~4M OMOP vectors. Order of magnitude: minutes to hours depending on hardware, then near-free at query time.
- Agentic AI — tokenised LLM calls, several tool round-trips per concept. Order of magnitude: seconds to minutes per concept, with real financial cost (cents to tens of cents per request, depending on the model).
The rule: don’t reach for a higher tier when a lower one suffices.
- Run syntactic + statistical first — it’s almost free, and already resolves a large chunk of easy cases.
- The semantic method takes over on unresolved concepts (synonyms, rephrasings, languages).
- Agentic AI should only run on the top N candidates already pre-ranked by lower tiers — never blindly on the 4M OMOP concepts. An agent-driven mapping can consume several dozen times more tokens than a single LLM call, because the agent reads the hierarchy, browses synonyms, recomputes, and so on.
It’s at once a question of financial cost (tokens, GPU), environmental impact (carbon footprint of LLM inference) and speed: running a 1,000-concept project entirely through agentic AI would cost tens of euros and several hours, where a syntactic + statistical → semantic → agentic on the residual pipeline handles the same load in minutes for a few cents.
The combined score
The combined score (left column, green bar) is a weighted average of each method’s score. Default weights are:
| Method | Weight | Role |
|---|---|---|
| Syntactic | ×1 | Catches near-literal matches despite writing variants. |
| Statistical | ×1 | Compares value distributions (creatinine vs sodium). |
| Semantic | ×2 | Picks up synonyms and rephrasings. |
| Agentic AI | ×3 | Final validation by an agentically-driven LLM. |
The Manage suggestions button (top left) opens a dialog with sliders to tune each weight in real time — the table re-sorts automatically. From there you can also see the number of loaded scores, import a new file, or wipe everything.
Manage suggestions
Tune the weight of each method in the combined score.
Method weights
similarity-scores.parquetSelection and mapping
SNOMED 251071003 — Invasive systolic arterial pressure | 91% | ||
SNOMED 72313002 — Systolic arterial pressure | 78% | ||
LOINC 8480-6 — Systolic blood pressure | 74% |
Candidates already mapped for this source concept are greyed out and not re-selectable.
- Click a row to select it. The action bar at the top then shows the usual mapping buttons (Exact match + SKOS dropdown + comment button).
- Candidates already mapped for this source concept are greyed out and not re-selectable.
- The ⓘ icon at the row end opens the target OMOP concept’s detail (description, synonyms, hierarchy).
The main alignment button “Exact” is pre-filled with the SKOS equivalence proposed by the agentic AI (exactMatch, narrowMatch, broadMatch, relatedMatch) — you can accept it in one click, or open the dropdown to pick a different equivalence if you disagree.
The Claude concept-mapping Skill
The skill is located in the Linkr repository, under .claude/skills/concept-mapping/. It is a standalone folder that holds:
SKILL.md— the orchestrator, read by Claude Code when you type/concept-mapping.reference.md— DuckDB patterns, TypeScript types, domain heuristics, parquet schema.scripts/embed_concepts.py— generates BioLORD embeddings for the OMOP vocabulary (run once per vocabulary release).scripts/compute_scores.py— computes scores for a given project (run once per project).scripts/update_state.py— recomputesstate.json(progress, sessions, file presence).review-template/— static HTML dashboard copied into the project folder the first time you launch the review server.
Plus two companion sub-skills, invoked automatically by the orchestrator:
concept-mapping-ai— Measurement, Condition, Procedure, Observation.concept-mapping-drug— drugs (RxNorm, dosages, pharmaceutical forms).
Requirements
| Resource | Why |
|---|---|
Claude Code installed (claude CLI) | To run the skill. |
| OHDSI vocabularies locally (parquet or CSV) | Downloaded from ATHENA. |
Python 3 + sentence-transformers, pandas, duckdb, pyarrow | For the embedding and scoring scripts. |
| ~10–30 GB of disk | The concept_embeddings.parquet file for ~4M concepts weighs several GB. |
| GPU recommended | The initial embedding pass can take hours on CPU. |
Configuration
Rather than typing paths every run, create a config.local.json at the project root:
{
"concept-mapping": {
"vocab_dir": "/path/to/ohdsi-vocabularies",
"models_dir": "/path/to/bert-models-cache",
"projects_dir": "/path/to/mapping-projects"
}
}
Details of the three keys:
vocab_dir— folder holding the OHDSI vocabularies downloaded from ATHENA (CONCEPT.parquet,CONCEPT_SYNONYM.parquet,CONCEPT_RELATIONSHIP.parquet,CONCEPT_ANCESTOR.parquet, plusVOCABULARY.parquetandDOMAIN.parquet). Roughly 10–30 GB depending on the active vocabularies. The skill loads these files into DuckDB at launch.models_dir— local cache for phrase-embedding models (by default BioLORD-2023-M, ~500 MB). If you switch model (PubMedBERT, SapBERT…), it is downloaded here on first use. Avoids re-downloading several gigabytes for every project.projects_dir— root of mapping projects. Each subfolder is a project exported from Linkr (project.json,mappings.json,source-concepts.csv) and ends up receiving the score files (similarity-scores.parquet,source_embeddings.parquet,state.json).
The skill reads these values on launch and stops asking.
Skill steps
The orchestrator runs through the following steps automatically:
Resume where you left off
If you’ve worked on this project before, the skill summarises where you stand (how many concepts mapped, which methods already ran, when the last session was) and offers to open a local dashboard in your browser to visualise progress.
Read the project
The best path is to export your project from Linkr (Export tab) and give the skill the path to the ZIP or the extracted folder. The skill finds project.json, mappings.json and source-concepts.csv inside.
Pick which concepts to process
You decide the scope: all unmapped concepts, a specific clinical category, the most frequent ones, or a list of codes you provide. No need to run through everything in one go.
Load the OMOP vocabularies
The skill loads the OHDSI tables it needs into memory (concept names, synonyms, hierarchy) so it can compare the source concept against each candidate quickly.
Compute syntactic + semantic + statistical scores
The skill runs the cheaper methods first: string comparison, embeddings, distribution comparison. Computation can take a while on a large vocabulary (up to a few hours on CPU), but it’s resumable — if you interrupt and relaunch, work already done isn’t redone.
Let the agentic AI decide
The skill hands over to a domain-specialised sub-skill (measurements, conditions, procedures, drugs). The LLM reads each candidate with its clinical context and proposes a justified mapping with a SKOS equivalence.
Write the results
All suggestions land in a similarity-scores.parquet file that you re-import into Linkr via Manage suggestions → Import. The agentic AI then appears as one more method in the Suggestions tab.
The agent reuses the similarities produced by the methods described above (syntactic, semantic, statistical) to pre-rank candidates before deciding — it does not invent a new scoring. It simply adds its own ai/<model-id> line next to the others, with a SKOS equivalence and a comment justifying its choice.
Output files
The skill writes two files to the project folder:
source_embeddings.parquet— the embedding vectors of source concepts. Kept so embeddings can be reused on the next run without recomputing everything.similarity-scores.parquet— one score per(source, candidate, method)triple. This is the file you re-import into Linkr.
Preview of source_embeddings.parquet:
source_embeddings.parquet| source_vocabulary_id | source_concept_code | source_concept_name | embedding (excerpt) | model | created_at |
|---|---|---|---|---|---|
| ICCA | M0118 | TA_systolique_invasive | [ 0.42,-0.18, 0.71, 0.05,-0.33, 0.58, 0.22,-0.41, …] | BioLORD-2023-M | 2026-05-14T09:31:02Z |
| ICCA | M0042 | Frequence_cardiaque | [ 0.39,-0.22, 0.68, 0.08,-0.30, 0.61, 0.19,-0.38, …] | BioLORD-2023-M | 2026-05-14T09:31:02Z |
| ICCA | M0071 | Natremie | [-0.15, 0.44, 0.12, 0.59, 0.31,-0.22,-0.48, 0.27, …] | BioLORD-2023-M | 2026-05-14T09:31:02Z |
The embedding column holds the full vector (384 values); only the first 8 dimensions are shown here for readability.
Preview of similarity-scores.parquet for a source concept (invasive systolic BP):
similarity-scores.parquet| source_vocabulary_id | source_concept_code | concept_id | method | score | equivalence | comment | created_at |
|---|---|---|---|---|---|---|---|
| ICCA | M0118 | 4353843 | syntactic/jaro-winkler | 0.83 | — | — | 2026-05-14T09:32:11Z |
| ICCA | M0118 | 4353843 | semantic/biolord | 0.91 | — | — | 2026-05-14T09:34:02Z |
| ICCA | M0118 | 4353843 | ai/claude-sonnet-4-6 | 0.97 | exactMatch | Concept dedicated to invasive measurement (arterial catheter). | 2026-05-14T09:35:48Z |
| ICCA | M0118 | 3004249 | ai/claude-sonnet-4-6 | 0.88 | broadMatch | Broader LOINC, covers any measurement method. | 2026-05-14T09:35:48Z |
Resume on interrupt
The scripts write their output incrementally, which lets you resume where the previous run stopped:
embed_concepts.pysaves progress every 50 batches (~25,000 concepts) — you can interrupt without losing work.compute_scores.pysaves every 100 source concepts — already-scored(source_vocabulary_id, source_concept_code)pairs are skipped on restart.
This matters for full vocabularies (4M concepts) that can take hours.