Home // Benchmarking Large-Language-Models for Knowledge-driven Feature Selection in Disease Prognosis on Genetic Data

Supervisor

Dušan Praščević

Leipzig University

prascevic@informatik.uni-leipzig.de

Dr. Jan Ewald

Leipzig University

jan.ewald@uni-leipzig.de

Author

Moritz Fröhlich

Benchmarking Large-Language-Models for Knowledge-driven Feature Selection in Disease Prognosis on Genetic Data

Status: at work / Type of Theses: Bachelor Theses / Location: Leipzig

The rapid advancement of Large Language Models (LLMs) such as ChatGPT, Claude, and Gemini presents unprecedented opportunities for data-driven healthcare research. This bachelor thesis investigates whether the extensive biological and medical knowledge embedded in these models can be leveraged for intelligent genetic feature selection in prognostic modeling. Feature selection in genetic datasets is particularly complex, requiring the identification of disease-relevant markers from thousands of gene variants. Traditional statistical approaches often fail to incorporate available biological knowledge about gene functions and disease mechanisms, potentially overlooking crucial genetic signatures.This study systematically evaluates multiple state-of-the-art LLMs using various prompting strategies to assess their capability in identifying prognostically relevant genetic features. The models will be benchmarked against each other and compared to established feature selection algorithms as well as random selection baselines. The research aims to determine whether and under which conditions LLMs can serve as knowledge-driven tools for feature selection in precision medicine. By bridging the gap between computational linguistics and bioinformatics, this work contributes to the development of more informed genetic risk models that could enhance personalized healthcare decisions.

funded by:

Gefördert vom Bundesministerium für Bildung und Forschung.

ScaDS.AI Dresden/Leipzig (Center for Scalable Data Analytics and Artificial Intelligence) is a center for Data Science, Artificial Intelligence and Big Data with locations in Dresden and Leipzig.