Project duration: 01.05.2022 – 31.04.2025
Research Area: Medical Informatics, Interoperability, Standardization, OMOP
The project “Standardization of Clinical Data for AI Applications” delves into the advancements and challenges in the field of Medical Informatics, focusing on the use of Common Data Models (CDMs) for health data, particularly in the context of rare diseases and patient-level predictions. We explore the adaptation of the OMOP CDM for specific medical research needs, including the review of methods for developing CDMs, compare different implementations (web-based and native) in OMOP-based patient-level prediction studies, and prototypic classification studies. These investigated aspects investigate the potentials and limitations of current methodologies in Medical Informatics standardization efforts, emphasizing the need for continual development to enhance the accuracy and efficiency of AI powered clinical data analysis and predictive modelling.
The project “Standardization of Clinical Data for AI Applications” addresses challenges in applying AI models for patient-level predictions using standardized clinical data in the OMOP Common Data Model. It specifically focuses on evaluating and comparing the feasibility of web-based OHDSI tools (like PLP) against native R-based solutions. The goal is to identify the most effective tools for clinical research, considering their performance, execution time, and ease of implementation, thereby guiding researchers in selecting optimal methodologies for their medical informatics projects.
The project addresses challenges in applying AI models for patient-level predictions using standardized clinical data in the OMOP Common Data Model. It specifically focuses on evaluating and comparing the feasibility of web-based OHDSI tools (like PLP) against native R-based solutions. The goal is to identify the most effective tools for clinical research, considering their performance, execution time, and ease of implementation, thereby guiding researchers in selecting optimal methodologies for their medical informatics projects.
Exemplarily, our “Benchmarking-Analysis-of-PLP-vs-MLR3” GitHub repository contains scripts and cohort definitions for comparing the Patient-Level Prediction (PLP) and mlr3 R-based packages using the SynPUF 5% dataset. Cohort definition was conducted using the ATLAS platform, with SQL files provided for the target population and outcome cohort definitions. The repository details prerequisites like having a PostgreSQL database (or any OMOP CDM-supported database) to store and handle the SynPUF 5% dataset. This setup is crucial for the benchmarking analysis in our current study. One can find more details in the repository here.
As part of the SATURN project, we developed a Common Data Model (CDM) utilizing the Observational Medical Outcomes Partnership (OMOP) framework specifically designed for rare diseases. This CDM serves as the foundation for a decision support system aimed at assisting general practitioners in diagnosing rare conditions characterized by unclear symptoms [1].
We employed current, classical AI models to exemplarily address common patient-level prediction tasks. Both the web-based and native approaches underwent benchmarking analysis to evaluate the strengths and weaknesses of the utilized R-Packages (e.g., PLP, mlr3).
The outcomes of this project contribute to the ongoing development of ML-based prediction models on a clinical scale. They provide insights for future research focused on developing patient-level prediction models within the OMOP CDM framework, highlighting the trade-offs and current limitations of different ML approaches.