InterHop Datathon 2024

The InterHop association is organizing a datathon in September 2024 dedicated to health data, providing participants with access to the MIMIC database in OMOP format. This event offers a unique opportunity to collaborate on data science, data engineering, and artificial intelligence projects applied to healthcare.

Practical information

Participants are encouraged to create their PhysioNet account in advance, including the signature of the Data Use Agreement (DUA).

A kickoff meeting is scheduled for Thursday, August 8 at 1:00 PM, to form teams and refine projects.

You can find all practical information about the datathon in this dedicated article.

Proposed themes and projects

Over the course of 48 hours, several key topics will be explored, ranging from data quality to mortality prediction and health indicator visualization.

1. FINESS+

Update the geographic data of healthcare facilities to ensure interoperability with open systems.

  • Main themes: Data engineering, Open data
  • Objective: Update and structure the geographic data of the 100,000 healthcare facilities listed in FINESS, making them interoperable with open systems like OpenStreetMap.
  • Relevance: Enables cross-referencing of existing databases and improves data quality, with a potential contribution to Toobib.org.

2. Maternity activity indicators

Develop interactive dashboards to analyze and monitor maternity activity indicators.

  • Main themes: Data visualization, Data science
  • Objective: Create dynamic dashboards to visualize annual maternity activity indicators (C-sections, transfers, epidurals, etc.).
  • Methodology: Use of LinkR, an open-source platform that facilitates health data analysis.
  • Expected impact: Helps improve maternity unit management and monitor obstetric practices.

3. Data quality

Detect and correct biases in health data

  • Main themes: Data cleaning, Pre-processing
  • Objective: Produce data quality indicators to assess potential biases from data entry software and improve the reliability of clinical data.
  • Methodology: Compare variables from the MIMIC-OMOP database using statistical methods and anomaly detection models.
  • Expected impact: Improved health data quality and easier reuse for research.

4. Mortality prediction

Build mortality prediction models to improve patient comparability

  • Main themes: Machine learning, Statistics
  • Objective: Develop and compare predictive models of ICU mortality (logistic regression, Random Forest, XGBoost, neural networks).
  • Methodology: Train models on MIMIC-OMOP and compare them with traditional scores like SOFA and SAPS II.
  • Expected impact: Development of an interoperable model applicable to local data, enhancing patient comparability in studies.

5. ICD-10 coding assistance

Assist ICD-10 coding using AI

  • Main themes: NLP, Large Language Models
  • Objective: Automate the identification of ICD-10 codes.
  • Methodology: Use of an approach based on LLMs and RAG (Retrieval-Augmented Generation).
  • Expected impact: Improved accuracy and efficiency of medical coding.

Conclusion

This datathon is part of an open science initiative, with the goal of producing interoperable and reusable code for the entire medical and scientific community.

Each project aims to improve the quality, visualization, and use of health data, while encouraging collaboration between clinicians, data scientists, and developers.

The produced source code will be made available on Framagit, and the results may be integrated into platforms like LinkR to be reused and enhanced after the event.

Last modified 2025.03.24: Add EN news (2109e6f)