Create a subset

Analyze population subsets

Introduction

During a project, it is often necessary to work on a subset of the patient population within a dataset.

For example, it might be useful to create a subset “Included patients” containing only the patients ultimately included in the final analyses of a study.

Similarly, one could imagine creating a subset of patients with a certain diagnosis, included within a specific period, or exposed to a particular treatment.

All of this is possible through the creation of subsets.

Creating a Subset

To create a subset, go to the Subsets page. For this, you need to have loaded a project. Then click on the “Subsets” icon in the top menu, to the right of the loaded project name.

You will arrive on the project subsets page.

A subset is a subset of a dataset, but it depends on a project. If two projects use the same dataset, they will not share the same subsets.

A subset “All Patients” is created by default when a project is created.

To create a subset, click the “+” icon on the left side of the screen.

Choose a name, then click “Add.” For this example, we will create a subset containing patients aged over 50 years.

Click on the subset you just created: you will be taken to the page of the selected subset.

On the right side of the screen, there are two tabs:

  • Summary: displays the subset’s information, which can be modified (including the subset’s description).
  • Code: this tab allows you to modify the code and add or remove patients from a subset, which we will explore in the following sections.

Adding patients to a subset

To add patients to a subset, use the add_patients_to_subset function.

This function takes the following arguments:

  • patients: A numeric vector containing the IDs of the patients to add.
  • subset_id: The ID of the subset to which the patients will be added (replaced by %subset_id% in the subset code, which is then replaced with the ID of the selected subset).
  • output, r, m, i18n, and ns: Arguments needed for data manipulation and error message display.

When a subset is created, code is automatically generated to add all patients to the subset.

This code will execute when the user clicks the button to run the code or if the subset is selected from the project (and it does not already contain patients).

We will modify this code to add patients aged over 50.

Let’s create the code to create a column with the patients’ ages.

d$visit_occurrence %>%
    dplyr::left_join(
        d$person %>% dplyr::select(person_id, birth_datetime),
        by = "person_id"
    ) %>%
    dplyr::collect() %>%
    dplyr::mutate(
        age = round(as.numeric(difftime(visit_start_datetime, birth_datetime, units = "days")) / 365.25, 1)
    )

The code editor of the selected subset allows you to test the code. We will extract the IDs of patients aged over 50. For now, we will comment out the add_patients_to_subset function.

d$visit_occurrence %>%
    dplyr::left_join(
        d$person %>% dplyr::select(person_id, birth_datetime),
        by = "person_id"
    ) %>%
    dplyr::collect() %>%
    dplyr::mutate(
        age = round(as.numeric(difftime(visit_start_datetime, birth_datetime, units = "days")) / 365.25, 1)
    ) %>%
    dplyr::filter(age > 50) %>%
    dplyr::distinct(person_id) %>%
    dplyr::pull()

Our code works, so we can store these IDs in a variable and then integrate them into the add_patients_to_subset function.

A message confirms that the patients have been successfully added to the subset.

Graphical Interface

A graphical interface does not yet exist, which would be very useful for filtering patients based on certain characteristics (age, gender, length of stay, hospitalization dates, or the presence of concepts such as diagnoses or treatments).

This graphical interface will be developed in the next version.


Removing patients from a subset

To remove patients from a subset, use the remove_patients_from_subset function, which works like add_patients_to_subset with the same arguments, particularly patients and subset_id.

For example, after adding all patients to the subset, you could remove those aged 50 or younger.

Integration into Plugins

It would be useful to create an Individual Data plugin for excluding patients based on one or more exclusion criteria defined by the user.

For instance, this plugin could remove patients from the “Included Patients” subset and add them to the “Excluded Patients” subset.

To achieve this, you would simply use the add_patients_to_subset and remove_patients_from_subset functions.

How can you retrieve the IDs of subsets? By using the m$subsets variable.

Now all that’s left is to create the plugin!

Last modified 2025.01.05: Updaet EN add a subset doc (#11) (90a57db)