JavaScript is required to use this site. Please enable JavaScript in your browser settings.

Standardization of Clinical Data for AI Applications

Project duration: 01.05.2022 – 31.04.2025

Research Area: Medical Informatics, Interoperability, Standardization, OMOP

The project “Standardization of Clinical Data for AI Applications” delves into the advancements and challenges in the field of Medical Informatics, focusing on the use of Common Data Models (CDMs) for health data, particularly in the context of rare diseases and patient-level predictions. We explore the adaptation of the OMOP CDM for specific medical research needs, including the review of methods for developing CDMs, compare different implementations (web-based and native) in OMOP-based patient-level prediction studies, and prototypic classification studies. These investigated aspects investigate the potentials and limitations of current methodologies in Medical Informatics standardization efforts, emphasizing the need for continual development to enhance the accuracy and efficiency of AI powered clinical data analysis and predictive modelling.

Problem and Aims

The project “Standardization of Clinical Data for AI Applications” addresses challenges in applying AI models for patient-level predictions using standardized clinical data in the OMOP Common Data Model. It specifically focuses on evaluating and comparing the feasibility of web-based OHDSI tools (like PLP) against native R-based solutions. The goal is to identify the most effective tools for clinical research, considering their performance, execution time, and ease of implementation, thereby guiding researchers in selecting optimal methodologies for their medical informatics projects.

The project addresses challenges in applying AI models for patient-level predictions using standardized clinical data in the OMOP Common Data Model. It specifically focuses on evaluating and comparing the feasibility of web-based OHDSI tools (like PLP) against native R-based solutions. The goal is to identify the most effective tools for clinical research, considering their performance, execution time, and ease of implementation, thereby guiding researchers in selecting optimal methodologies for their medical informatics projects.

Practical Example

Exemplarily, our “Benchmarking-Analysis-of-PLP-vs-MLR3” GitHub repository contains scripts and cohort definitions for comparing the Patient-Level Prediction (PLP) and mlr3 R-based packages using the SynPUF 5% dataset. Cohort definition was conducted using the ATLAS platform, with SQL files provided for the target population and outcome cohort definitions. The repository details prerequisites like having a PostgreSQL database (or any OMOP CDM-supported database) to store and handle the SynPUF 5% dataset. This setup is crucial for the benchmarking analysis in our current study. One can find more details in the repository here.

As part of the SATURN project, we developed a Common Data Model (CDM) utilizing the Observational Medical Outcomes Partnership (OMOP) framework specifically designed for rare diseases. This CDM serves as the foundation for a decision support system aimed at assisting general practitioners in diagnosing rare conditions characterized by unclear symptoms [1].

Technology

We employed current, classical AI models to exemplarily address common patient-level prediction tasks. Both the web-based and native approaches underwent benchmarking analysis to evaluate the strengths and weaknesses of the utilized R-Packages (e.g., PLP, mlr3).

Outlook

The outcomes of this project contribute to the ongoing development of ML-based prediction models on a clinical scale. They provide insights for future research focused on developing patient-level prediction models within the OMOP CDM framework, highlighting the trade-offs and current limitations of different ML approaches.

Publications

  • How to customize Common Data Models for rare diseases: an OMOP-based implementation and lessons learned. Najia Ahmadi, Michele Zoch, Oya Guengoeze, Carlo Facchinello, Antonia Mondorf, … Markus Wolfien, Martin Sedlmayr (pre-print 2023) https://www.researchsquare.com/article/rs-3719430/v1
  • Methods used in the development of Common Data Models for health data – a Scoping Review. Najia Ahmadi, Michele Zoch, Patricia Kelbert, Richard Noll, Jannik Schaaf, Markus Wolfien, Martin Sedlmayr (JMIR Medical Informatics 2023) https://medinform.jmir.org/2023/1/e45116
  • A comparative patient-level prediction study in OMOP CDM demonstrates advantages for native and web-based implementations. Najia Ahmadi, Vu Quang Nguyen, Martin Sedlmayr, Markus Wolfien (Pre-print 2023)https://doi.org/10.21203/rs.3.rs-2546267/v1
  • OMOP CDM Can Facilitate Data-Driven Studies for Cancer Prediction: A Systematic Review. Najia Ahmadi, Yuan Peng, Markus Wolfien, Michele Zoch, Martin Sedlmayr. (IJMS 2022) https://www.mdpi.com/1422-0067/23/19/11834

Team

Lead

  • Dr. Markus Wolfien

Team Members

  • Najia Ahmadi
funded by:
Gefördert vom Bundesministerium für Bildung und Forschung.
Gefördert vom Freistaat Sachsen.