Resources
2/7
8 min

Defining your variables well: the key to reliable data collection

Concept, temporal anchor, time window, and aggregate function — four dimensions for unambiguous variables.

TL;DR

If two people extract the same data from the same chart and get different results, it’s often because the variable wasn’t defined precisely enough. Defining a variable isn’t just about choosing a concept — it’s also about specifying when, over what period, and which value to retain. This four-dimension framework reduces errors and ambiguity, whether the collection is done manually or through a data warehouse.

The problem: one variable, multiple interpretations

Let’s take a simple example: you want to collect the serum creatinine for each patient.

One concept, three different results

A patient is admitted on January 5 at 2:00 PM. They have three creatinine measurements:

  • January 5, 4:00 PM: 92 µmol/L
  • January 6, 6:00 AM: 118 µmol/L
  • January 6, 6:00 PM: 104 µmol/L

Which value do you enter in your spreadsheet? The first one? The highest? The one from the first 24 hours?

Without explicit instructions, each person doing the collection will make their own choice — and that choice will vary from one patient to another, even for the same person. It’s this type of micro-decision that produces heterogeneous data.

The four dimensions of a variable

For a variable to be defined without ambiguity, four elements must be specified.

1

The concept

What is being measured: heart rate, serum creatinine, primary diagnosis, SOFA score (Sequential Organ Failure Assessment)… This is the most intuitive dimension, the one we usually write down first. The unit of measurement, when relevant, is part of the concept.

2

The temporal anchor

The reference point in the patient's journey from which the data is sought. For example: ICU admission, start of mechanical ventilation, diagnosis of sepsis…

3

The time window

The period, relative to the anchor, during which the data is searched. For example: H0 to H24 after admission, or D-365 to H0 (for medical history).

4

The aggregate function

When multiple values exist within the window, which one to retain? The first, the last, the maximum, the minimum, the mean, presence/absence…

A complete example

Let’s go back to serum creatinine. Here’s how to define it unambiguously:

DimensionValue
ConceptSerum creatinine (µmol/L)
Temporal anchorFirst ICU admission
Time windowH0 to H24
Aggregate functionMaximum

With this definition, two people extracting the data from the same chart will get the same result — whether it’s a manual collection or a database query.

More examples to illustrate

VariableConceptAnchorWindowAggregate
HR at admissionHeart rateICU admissionH0 to H1First
History of diabetesDiabetes diagnosisICU admissionNo limit – H0Presence (yes/no)
Max lactate on D1Serum lactateICU admissionH0 to H24Maximum
Norepinephrine during sepsisNorepinephrine (administration)Sepsis diagnosisH0 to H72Presence (yes/no)
Length of stayICU stayICU admissionFull durationDuration (in days)

The anchor isn't always admission

The temporal anchor depends on the research question. If you’re studying post-intubation complications, the anchor would be the start of mechanical ventilation. If you’re looking at medical history, you’d search for diagnoses prior to admission, with no time limit.

Why it matters — even for manual collection

One might think this framework is mainly useful for database queries on a data warehouse. In reality, it’s just as essential for manual collection.

Without a framework, collection drifts

When collection spans several weeks, implicit choices evolve. The person doing the collection ends up applying different decision rules at the beginning and end of the process — without even realizing it. An explicit framework protects against this drift.

Well-defined variables:

Linkr’s Study Designer

Linkr offers a dedicated tool for this step: the Study Designer. It guides clinicians through defining each variable along the four dimensions — concept, temporal anchor, time window, aggregate — and automatically generates a structured protocol, exportable in Word, Excel, or JSON.

A tool to structure your protocol

The Study Designer is freely accessible on its dedicated page. It lets you define your variables, inclusion criteria, and analysis plan — all in an interface designed for clinicians.

Key takeaways

  • Defining a variable means specifying four dimensions: the concept, the temporal anchor, the time window, and the aggregate function.
  • This framework reduces errors and ambiguity, whether the collection is manual or automated.
  • Well-defined variables can be directly translated into data warehouse queries.
  • Linkr's Study Designer helps structure this definition into an exportable protocol.