Advanced Multimodal Learning for Electronic Health Records: Toward Comprehensive and Interpretable Clinical Intelligence
Status: open / Type of Theses: Master theses / Location: Dresden
This thesis invites you to work at the intersection of machine learning, multimodal AI, and digital health. Electronic Health Records (EHRs) increasingly combine several data types: structured tabular data (diagnoses, procedures, lab values), unstructured clinical text (reports, summaries), medical images (e.g., X-rays), and clinical time series (vital signs, monitoring data). Most current models focus on a single modality – but real clinical understanding requires joint reasoning over all of them.
In this thesis, you will explore how advanced multimodal learning can be used to build unified, interpretable representations of patients from heterogeneous EHR data. The work will be supervised at TU Dresden (AI / ML), with the possibility of exchange with clinical partners. The aim is a methodologically solid thesis that is ambitious enough to form the basis for a peer-reviewed publication.
What are the tasks?
- Analyse multimodal EHR data & problem setting
- Get familiar with typical EHR data types (tabular codes, free text, images, time series).
- Identify realistic downstream tasks, e.g., risk prediction, length-of-stay estimation, readmission prediction, or anomaly detection.
- Review recent literature on multimodal learning in healthcare and identify gaps (e.g., limited interpretability, weak use of certain modalities).
- Develop multimodal modeling strategies
- Implement strong single-modality baselines (e.g., models for tabular data and/or clinical text) as reference points.
- Design and implement a multimodal learning framework that jointly uses two or more modalities (e.g., tabular + text, tabular + time series, or tabular + images).
- Experiment with different fusion strategies (early fusion, late fusion, cross-attention, contrastive or representation learning).
- Optionally explore the use of pre-trained foundation models (e.g., clinical language models or vision encoders) as building blocks.
- Interpretability & clinical reasoning
- Integrate interpretability techniques (e.g., attention analysis, feature attribution, modality contribution analysis) into your models.
- Analyze how each modality contributes to predictions: Which signals matter most? When does a modality help, when does it confuse?
- Propose simple visualizations or explanation schemes that a clinician could plausibly understand.
- Evaluation & analysis
- Evaluate your models on one or more well-defined prediction / modeling tasks with appropriate metrics (e.g., AUROC, AUPRC, calibration).
- Perform ablation studies: What happens if you remove a modality, or restrict to single-modality baselines?
- Critically discuss strengths, weaknesses, and potential clinical usefulness and limitations of your approach.
What prerequisites do you need?
- Strong motivation for applying AI/ML to healthcare.
- Good programming skills in Python and experience with deep learning.
- Familiarity with transformer-based or sequence models, and at least basic knowledge of one of: NLP, time-series modeling, or computer vision.
- Very good English skills (for reading literature and writing the thesis).
Why this thesis is special
- High-impact application: Multimodal EHR modeling is central to the future of clinical decision support, risk prediction, and patient safety.
- Technically challenging & modern: You will work with state-of-the-art multimodal and deep learning methods rather than “toy” examples.
- Research proximity: The topic is closely aligned with ongoing research activities, offering a realistic chance for a publication if results are strong.
- Method + insight: You will not only build models, but also analyze why they behave as they do — a crucial step toward trustworthy clinical AI.