Suggestions

AI-assisted pre-mapping: combined multi-method score computed outside Linkr by the Claude concept-mapping Skill.

Summary

The Suggestions tab shows, for each source concept, OMOP candidates ranked by a combined score (syntactic + semantic + statistical + AI). Scores are not computed inside Linkr: they are produced by the Claude concept-mapping Skill, which runs in Claude Code against a ZIP export of the project. You then validate every mapping manually in the editor.

Client Backend

Why a separate tab

Mapping 1,000 local codes by hand against the 4 million concepts of the OMOP vocabulary is slow. Each algorithm family has its own bias:

String comparison misses synonyms (“HR” vs “heart rate”).
Semantic embeddings occasionally confuse close-but-different concepts (“serum creatinine” vs “urine creatinine”).
AI validation adds clinical context but is expensive to run over 4 million pairs.

The Suggestions tab combines several methods into a single score and surfaces the per-method details, so you keep the final decision.

The workflow

linkr-v2-b1800b.frama.io

Linkr

Export the project

ZIP: project.json, mappings.json, source-concepts.csv

Claude Code

concept-mapping Skill

Score computation · mapping proposals · LLM

Linkr

Suggestions tab

Import the .parquet file · weight · validate

Three steps: export the project, run the Skill in Claude Code, re-import the scores file into Linkr.

In Linkr — export the project as a ZIP from the Export tab. The ZIP contains project.json, mappings.json and source-concepts.csv.
In Claude Code — run the concept-mapping skill (see below). It computes the scores, drafts mappings via an LLM, and produces a similarity-scores.parquet file.
Back in Linkr — open the mapping editor, switch to Suggestions mode, click Manage → Import and load the .parquet.

The Suggestions tab in Linkr

linkr-v2-b1800b.frama.io

Concept setsSearchSuggestions

Score	Methods	Vocab	Concept name	Code	Std

92%	95%88%91%	LOINC	Creatinine [Mass/volume] in Serum or Plasma	2160-0	S
81%	83%79%82%	LOINC	Creatinine [Mass/volume] in Blood	38483-4	S
76%	80%71%77%	LOINC	Creatinine [Moles/volume] in Serum or Plasma	14682-9	S
68%	72%64%	SNOMED	Serum creatinine measurement	113075003	S
54%	58%51%	LOINC	Creatinine renal clearance	2164-2	S

8 / 24 results⚙

The Suggestions tab: combined score on the left, colour-coded method dots, and an action bar to map or manage scores.

Similarity methods

Four method families can feed the candidate ranking. Each attacks the problem from a different angle, and their failure modes don’t overlap — which is why combining them beats any single family.

Syntactic

Compares the label character strings.

Semantic

Compares meaning via biomedical embeddings.

Statistical

Compares the distributions of numerical values.

An LLM re-reads top candidates with the clinical context.

In Linkr, every method is represented next to the combined score by a coloured dot (colours above) followed by its percentage. On hover, a tooltip names the exact algorithm. This helps understand why a score is high: a candidate scoring 92% from the syntactic method alone is likely a textual false friend, while one scoring 80% from semantic + AI is much more solid.

1. Syntactic — compare character strings

We compare label strings as-is, without any understanding of their meaning. Useful because source and OMOP labels often follow different writing conventions: case, spaces or underscores, accents, abbreviations, word order, punctuation. Three algorithms in the catalogue:

Jaro-Winkler — letter-level proximity with a bonus for shared prefixes. Matches “hemoglobin_a1c” with “Hemoglobin A1c” despite the case and separator differences.
Token-Sort — splits labels into words, sorts them alphabetically, then compares. Word-order agnostic (“blood pressure systolic” ≈ “systolic blood pressure”).
N-gram IDF — splits into character n-grams (subsequences of N letters), weights each n-gram by its rarity in the vocabulary. Usagi’s historical method.

Source

hemoglobin_a1c(source label, no accents, no spaces)

Hemoglobin A1c

0.93

A1c hemoglobin

0.78

Hemoglobin

0.62

Glucose

0.14

Jaro-Winkler — letter-level proximity, bonus for shared prefix; Token-Sort lets "A1c hemoglobin" match the source despite word-order differences.

Jaro-Winkler applied to a snake_case source label: the OMOP labels with the highest letter-level proximity bubble to the top, despite differences in case, spacing and word order.

Strengths — catches near-literal matches despite writing variants (case, spaces, abbreviations, word order), very fast (milliseconds per pair).

Limits — does not account for synonyms: “creatinine” and “creat” are close, but “heart rate” and “pulse” score zero.

2. Semantic — compare meaning

Each label is encoded as a numerical vector through a sentence-embedding model, then we measure the angle between vectors via cosine similarity. Two labels with close meanings will produce close vectors, even if the words differ.

From label to embedding vector

BioLORD-2023-M projects each label into a 384-dimensional space. Here are 24 dimensions as a heatmap — each cell is one dimension, colour encodes the value.

−1

+1Dimension value

Heart rate

…/384

Pulse

…/384

Blood pressure

…/384

Cosine similarity

cos(θ) = (A · B) / (‖A‖ × ‖B‖)

Heart ratevsPulse

0.94

Heart ratevsBlood pressure

0.21

The patterns for "Heart rate" and "Pulse" are nearly identical cell by cell → very close vectors. "Blood pressure" has a different pattern → distant vector. The closer the cosine score gets to 1, the closer the meanings.

'Heart rate' and 'pulse' have almost the same vector (cos ≈ 0.94), even though the characters differ entirely. 'Blood pressure' sits in another region of the space.

Default model: BioLORD-2023-M, trained on biomedical text and multilingual (French, English, Spanish, German…).

Configurable alternatives — any Sentence-Transformers compatible BERT model:

PubMedBERT, SapBERT, ClinicalBERT — biomedical-specialised but English-only.
all-MiniLM-L6-v2 — general-purpose, English-only, much faster.
paraphrase-multilingual-MiniLM-L12-v2, LaBSE — general-purpose, multilingual.

Multilingual coverage

If your source labels are in French (or any non-English language), stick with BioLORD-2023-M or switch to an explicitly multilingual model. PubMedBERT, SapBERT, ClinicalBERT and all-MiniLM-L6-v2 will produce poor-quality embeddings on non-English text.

Strengths — catches synonyms (“heart rate” ≈ “pulse”), rephrasings, terminology variants.

Limits — can conflate two close-but-distinct concepts (“serum creatinine” vs “urine creatinine”). Expensive to pre-compute over the ~4M OMOP concepts (GPU recommended).

3. Statistical — compare value distributions

For numerical variables (Measurement only), we compare the statistical distribution of values between source concept and candidates: min, P25, median, P75, max, mean, standard deviation. Two variables taking values in the same range with the same shape are probably the same measurement.

Concretely, we use a statistical test to quantify the gap between the two distributions:

Kolmogorov-Smirnov test (KS) — maximum gap between cumulative distribution functions. KS = 0 → identical distributions; KS close to 1 → very different distributions.
Wasserstein distance — “cost” of transporting the mass of one distribution onto the other. More robust than KS when supports are shifted (unit change).

Source concept: serum creatinine (µmol/L)

12,450 measurements, 1,842 patients

Source

Candidate

LOINC 2160-0 — Creatinine [Mass/volume] in Serum or Plasmaunit: mg/dL

Likely match

KS = 0.08Wass. = 0.12

LOINC 14682-9 — Creatinine [Moles/volume] in Serum or Plasmaunit: µmol/L

Likely match

KS = 0.06Wass. = 0.09

LOINC 2951-2 — Sodium [Moles/volume] in Serum or Plasmaunit: mmol/L

Different distribution

KS = 0.62Wass. = 0.71

Kolmogorov-Smirnov test: KS close to 0 = identical distributions; KS > 0.5 = very different distributions. Smoothed curves (kernel density estimate) let you compare shape and position without the distributions hiding each other.

The source distribution (creatinine in µmol/L) overlaps almost perfectly with two LOINC creatinine concepts, and diverges sharply from sodium — which sits in an entirely different range.

Strengths — works even when the label is ambiguous or missing. Very discriminating: hard to confuse creatinine (60–120 µmol/L) with sodium (135–145 mmol/L).

Limits — only applicable to numerical variables with enough values for the distribution to be representative. Sensitive to data-entry artifacts (outliers, mixed units).

4. Agentic AI — let an LLM reason about it

An LLM (Claude), driven agentically (the LLM decides which tools to invoke: reading labels, browsing the OMOP hierarchy, computing on source values), re-reads the top N candidates from the other methods with all the context available: source label, statistics, units, frequency, value examples. It proposes one or more mappings carrying a SKOS equivalence (exactMatch, narrowMatch, broadMatch, relatedMatch) and a short clinical justification.

Source concept

TA_systolique_invasive

ICCA · M0118 · "Invasive systolic arterial pressure" · 24,380 measurements · median 122 mmHg

Pre-ranked candidates

SNOMED 251071003 — Invasive systolic arterial pressure
91%
SNOMED 72313002 — Systolic arterial pressure
78%
LOINC 8480-6 — Systolic blood pressure
74%

Agentic AI reasoning

The label "TA_systolique_invasive" refers to systolic arterial pressure measured via an arterial catheter (arterial line), not by cuff. SNOMED 251071003 matches this definition exactly — a direct mapping. LOINC 8480-6 is a broader concept (systolic, any method): the target is more general than the source, which is written broadMatch in SKOS. The observed distribution (median 122 mmHg, right tail up to ~180) is consistent with an invasive ICU measurement.

Proposed mappings

SNOMED 251071003 — Invasive systolic arterial pressureConfidence: 97%

SKOS equivalence : skos:exactMatch — The target concept describes precisely the same measurement.

LOINC 8480-6 — Systolic blood pressureConfidence: 88%

SKOS equivalence : skos:broadMatch — The target is more general (any measurement method), so broader than the invasive source.

The agentic AI re-reads the top three candidates, identifies SNOMED 251071003 as the exact match (a concept dedicated to invasive measurement), and additionally proposes LOINC 8480-6 as a broadMatch since the latter covers systolic pressure regardless of method.

Alternatives to Claude Code

The concept-mapping Skill ships for Claude Code, but the underlying idea — an agentic CLI orchestrating Python scripts and writing a .parquet of scores — has nothing Claude-specific. Other agentic CLIs should be able to run an equivalent flow: Mistral Vibe CLI, Codex CLI, Gemini CLI… In practice only Claude Code has been tested so far — porting the Skill to another CLI would require rewriting SKILL.md in the target tool’s format.

On the local-model side, Claude Code can be wired to Ollama to run an open-source LLM (Llama, Mistral, Qwen…) on your own machine instead of hitting the Anthropic API. Upsides: no network calls, no per-token cost, data never leaves your workstation. Downsides: reasoning quality below frontier models on complex clinical cases, and you need the hardware (at least 16 GB of VRAM for a useful model).

Strengths — the only method that “understands” clinical semantics (qualifiers, measurement context, nuanced equivalences). Can say “this is not exact, it’s a sub-type” via narrowMatch.

Limits — slow and expensive (tokenised API calls), risk of hallucinations on rare cases, still requires human validation.

Frugality principle — start with the cheapest method

The methods sit at different resource tiers, roughly grouped as:

Syntactic and statistical — light CPU, local computations on strings or on values already in memory. Order of magnitude: milliseconds per pair.
Semantic (BERT / embeddings) — a GPU is recommended for inference and for pre-computing the ~4M OMOP vectors. Order of magnitude: minutes to hours depending on hardware, then near-free at query time.
Agentic AI — tokenised LLM calls, several tool round-trips per concept. Order of magnitude: seconds to minutes per concept, with real financial cost (cents to tens of cents per request, depending on the model).

The rule: don’t reach for a higher tier when a lower one suffices.

Run syntactic + statistical first — it’s almost free, and already resolves a large chunk of easy cases.
The semantic method takes over on unresolved concepts (synonyms, rephrasings, languages).
Agentic AI should only run on the top N candidates already pre-ranked by lower tiers — never blindly on the 4M OMOP concepts. An agent-driven mapping can consume several dozen times more tokens than a single LLM call, because the agent reads the hierarchy, browses synonyms, recomputes, and so on.

It’s at once a question of financial cost (tokens, GPU), environmental impact (carbon footprint of LLM inference) and speed: running a 1,000-concept project entirely through agentic AI would cost tens of euros and several hours, where a syntactic + statistical → semantic → agentic on the residual pipeline handles the same load in minutes for a few cents.

The combined score

The combined score (left column, green bar) is a weighted average of each method’s score. Default weights are:

Method	Weight	Role
Syntactic	×1	Catches near-literal matches despite writing variants.
Statistical	×1	Compares value distributions (creatinine vs sodium).
Semantic	×2	Picks up synonyms and rephrasings.
Agentic AI	×3	Final validation by an agentically-driven LLM.

The Manage suggestions button (top left) opens a dialog with sliders to tune each weight in real time — the table re-sorts automatically. From there you can also see the number of loaded scores, import a new file, or wipe everything.

linkr-v2-b1800b.frama.io

Manage suggestions

Tune the weight of each method in the combined score.

Method weights

Syntactic

×1.0

Statistical

×1.0

Semantic

×2.0

Agentic AI

×3.0

12,384 scores loaded from similarity-scores.parquet

The Manage suggestions dialog: per-method weight sliders, count of loaded scores, and Import / Delete actions.

Selection and mapping

linkr-v2-b1800b.frama.io

	`SNOMED 251071003 — Invasive systolic arterial pressure`	91%
	`SNOMED 72313002 — Systolic arterial pressure`	78%
	`LOINC 8480-6 — Systolic blood pressure`	74%

Candidates already mapped for this source concept are greyed out and not re-selectable.

When a row is selected, an action bar appears at the top with the usual mapping buttons. Candidates already mapped (here LOINC 8480-6) appear greyed out and struck through.

Click a row to select it. The action bar at the top then shows the usual mapping buttons (Exact match + SKOS dropdown + comment button).
Candidates already mapped for this source concept are greyed out and not re-selectable.
The ⓘ icon at the row end opens the target OMOP concept’s detail (description, synonyms, hierarchy).

The main alignment button “Exact” is pre-filled with the SKOS equivalence proposed by the agentic AI (exactMatch, narrowMatch, broadMatch, relatedMatch) — you can accept it in one click, or open the dropdown to pick a different equivalence if you disagree.

The Claude `concept-mapping` Skill

The skill is located in the Linkr repository, under .claude/skills/concept-mapping/. It is a standalone folder that holds:

SKILL.md — the orchestrator, read by Claude Code when you type /concept-mapping.
reference.md — DuckDB patterns, TypeScript types, domain heuristics, parquet schema.
scripts/embed_concepts.py — generates BioLORD embeddings for the OMOP vocabulary (run once per vocabulary release).
scripts/compute_scores.py — computes scores for a given project (run once per project).
scripts/update_state.py — recomputes state.json (progress, sessions, file presence).
review-template/ — static HTML dashboard copied into the project folder the first time you launch the review server.

Plus two companion sub-skills, invoked automatically by the orchestrator:

concept-mapping-ai — Measurement, Condition, Procedure, Observation.
concept-mapping-drug — drugs (RxNorm, dosages, pharmaceutical forms).

Requirements

Resource	Why
Claude Code installed (`claude` CLI)	To run the skill.
OHDSI vocabularies locally (parquet or CSV)	Downloaded from ATHENA.
Python 3 + `sentence-transformers`, `pandas`, `duckdb`, `pyarrow`	For the embedding and scoring scripts.
~10–30 GB of disk	The `concept_embeddings.parquet` file for ~4M concepts weighs several GB.
GPU recommended	The initial embedding pass can take hours on CPU.

Configuration

Rather than typing paths every run, create a config.local.json at the project root:

{
  "concept-mapping": {
    "vocab_dir":     "/path/to/ohdsi-vocabularies",
    "models_dir":    "/path/to/bert-models-cache",
    "projects_dir":  "/path/to/mapping-projects"
  }
}

Details of the three keys:

vocab_dir — folder holding the OHDSI vocabularies downloaded from ATHENA (CONCEPT.parquet, CONCEPT_SYNONYM.parquet, CONCEPT_RELATIONSHIP.parquet, CONCEPT_ANCESTOR.parquet, plus VOCABULARY.parquet and DOMAIN.parquet). Roughly 10–30 GB depending on the active vocabularies. The skill loads these files into DuckDB at launch.
models_dir — local cache for phrase-embedding models (by default BioLORD-2023-M, ~500 MB). If you switch model (PubMedBERT, SapBERT…), it is downloaded here on first use. Avoids re-downloading several gigabytes for every project.
projects_dir — root of mapping projects. Each subfolder is a project exported from Linkr (project.json, mappings.json, source-concepts.csv) and ends up receiving the score files (similarity-scores.parquet, source_embeddings.parquet, state.json).

The skill reads these values on launch and stops asking.

Skill steps

The orchestrator runs through the following steps automatically:

Resume where you left off

If you’ve worked on this project before, the skill summarises where you stand (how many concepts mapped, which methods already ran, when the last session was) and offers to open a local dashboard in your browser to visualise progress.

Read the project

The best path is to export your project from Linkr (Export tab) and give the skill the path to the ZIP or the extracted folder. The skill finds project.json, mappings.json and source-concepts.csv inside.

Pick which concepts to process

You decide the scope: all unmapped concepts, a specific clinical category, the most frequent ones, or a list of codes you provide. No need to run through everything in one go.

Load the OMOP vocabularies

The skill loads the OHDSI tables it needs into memory (concept names, synonyms, hierarchy) so it can compare the source concept against each candidate quickly.

Compute syntactic + semantic + statistical scores

The skill runs the cheaper methods first: string comparison, embeddings, distribution comparison. Computation can take a while on a large vocabulary (up to a few hours on CPU), but it’s resumable — if you interrupt and relaunch, work already done isn’t redone.

Let the agentic AI decide

The skill hands over to a domain-specialised sub-skill (measurements, conditions, procedures, drugs). The LLM reads each candidate with its clinical context and proposes a justified mapping with a SKOS equivalence.

Write the results

All suggestions land in a similarity-scores.parquet file that you re-import into Linkr via Manage suggestions → Import. The agentic AI then appears as one more method in the Suggestions tab.

The agent reuses the similarities produced by the methods described above (syntactic, semantic, statistical) to pre-rank candidates before deciding — it does not invent a new scoring. It simply adds its own ai/<model-id> line next to the others, with a SKOS equivalence and a comment justifying its choice.

Output files

The skill writes two files to the project folder:

source_embeddings.parquet — the embedding vectors of source concepts. Kept so embeddings can be reused on the next run without recomputing everything.
similarity-scores.parquet — one score per (source, candidate, method) triple. This is the file you re-import into Linkr.

Preview of source_embeddings.parquet:

linkr-v2-b1800b.frama.io

source_embeddings.parquet

3 rows out of 1,234 · 384-dimensional vectors

source_vocabulary_id	source_concept_code	source_concept_name	embedding (excerpt)	model	created_at
ICCA	M0118	TA_systolique_invasive	[ 0.42,-0.18, 0.71, 0.05,-0.33, 0.58, 0.22,-0.41, …]	BioLORD-2023-M	2026-05-14T09:31:02Z
ICCA	M0042	Frequence_cardiaque	[ 0.39,-0.22, 0.68, 0.08,-0.30, 0.61, 0.19,-0.38, …]	BioLORD-2023-M	2026-05-14T09:31:02Z
ICCA	M0071	Natremie	[-0.15, 0.44, 0.12, 0.59, 0.31,-0.22,-0.48, 0.27, …]	BioLORD-2023-M	2026-05-14T09:31:02Z

The embedding column holds the full vector (384 values); only the first 8 dimensions are shown here for readability.

One embedding vector (384 values) per source concept.

Preview of similarity-scores.parquet for a source concept (invasive systolic BP):

linkr-v2-b1800b.frama.io

similarity-scores.parquet

4 rows out of 12,384

source_vocabulary_id	source_concept_code	concept_id	method	score	equivalence	comment	created_at
ICCA	M0118	4353843	syntactic/jaro-winkler	0.83	—	—	2026-05-14T09:32:11Z
ICCA	M0118	4353843	semantic/biolord	0.91	—	—	2026-05-14T09:34:02Z
ICCA	M0118	4353843	ai/claude-sonnet-4-6	0.97	exactMatch	Concept dedicated to invasive measurement (arterial catheter).	2026-05-14T09:35:48Z
ICCA	M0118	3004249	ai/claude-sonnet-4-6	0.88	broadMatch	Broader LOINC, covers any measurement method.	2026-05-14T09:35:48Z

One row per (source concept, target concept, method) triple. The agentic AI method also fills in a SKOS equivalence and a comment justifying its choice.

Resume on interrupt

The scripts write their output incrementally, which lets you resume where the previous run stopped:

embed_concepts.py saves progress every 50 batches (~25,000 concepts) — you can interrupt without losing work.
compute_scores.py saves every 100 source concepts — already-scored (source_vocabulary_id, source_concept_code) pairs are skipped on restart.

This matters for full vocabularies (4M concepts) that can take hours.

Why a separate tab

The workflow

The Suggestions tab in Linkr

Similarity methods

1. Syntactic — compare character strings

2. Semantic — compare meaning

3. Statistical — compare value distributions

4. Agentic AI — let an LLM reason about it

The combined score

Selection and mapping

The Claude concept-mapping Skill

Requirements

Configuration

Skill steps

Output files

Resume on interrupt

The Claude `concept-mapping` Skill