Home // Explainable biomedical data modality translation with ontology-based cross-modal autoencoders

Supervisor

Dr. Jan Ewald

Leipzig University

jan.ewald@uni-leipzig.de

Explainable biomedical data modality translation with ontology-based cross-modal autoencoders

Status: open / Type of Theses: Master theses / Location: Leipzig

Deep generative models have gained increasing relevance in the biomedical sciences, where data is often collected across multiple modalities, such as gene expression, epigenetic profiles, and clinical phenotypes. Cross-modal variational autoencoders (VAEs) provide a principled framework for learning shared latent representations across such heterogeneous data sources and allow for the translation between modalities (Yang & Uhler, 2019). By mapping one data type into another, these models enable novel applications such as imputing missing modalities, integrating multi-omics datasets, and uncovering latent disease mechanisms.

Despite their success, cross-modal VAEs remain largely opaque, limiting their adoption in domains where biological interpretability is essential. This thesis proposes to address this challenge by developing ontology-based decoders that incorporate structured biological knowledge into the decoding process. Ontologies, such as the Gene Ontology or disease ontologies, encode hierarchical and semantic relationships between entities, making them well-suited to guide model outputs toward biologically meaningful reconstructions and translations. Embedding these structures into the decoder aims to enhance interpretability, reduce spurious correlations, and provide insights aligned with established biomedical knowledge.

The project will be implemented within AUTOENCODIX (https://github.com/jan-forest/autoencodix), an autoencoder framework developed by our research group. AUTOENCODIX offers modular components for training, evaluating, and extending autoencoders, thereby providing an ideal foundation for experimenting with ontology-augmented cross-modal VAEs. The thesis will involve (i) designing and integrating ontology-informed decoder architectures, (ii) benchmarking explainability and predictive performance against standard cross-modal VAEs, and (iii) evaluating the biological plausibility of the learned representations in selected biomedical case studies.

This work will contribute to advancing interpretable machine learning in computational biology, bridging the gap between powerful generative modeling and domain-driven explanatory requirements.

Student profile: Master Data Science/Bioinformatik/Informatik with knowledge and experience in Python Programming and PyTorch implementation. Basic understanding of cell biology and first experiences with molecular omics data.

funded by:

Gefördert vom Bundesministerium für Bildung und Forschung.

ScaDS.AI Dresden/Leipzig (Center for Scalable Data Analytics and Artificial Intelligence) is a center for Data Science, Artificial Intelligence and Big Data with locations in Dresden and Leipzig.

Dresden

Visitor address Technische Universität Dresden
ScaDS.AI Dresden/Leipzig
Bürogebäude Strehlener Straße
Strehlener Straße 12, 14
01069 Dresden

Postal address Technische Universität Dresden
Zentrum für Informationsdienste und Hochleistungsrechnen
ScaDS.AI Dresden/Leipzig
01062 Dresden

Leipzig

Visitor address ScaDS.AI Dresden/Leipzig
Löhrs Carré
Humboldtstraße 25, Uferstr. 11
04105 Leipzig

Postal address Universität Leipzig
Data Science Center ScaDS.AI Leipzig
Internes Postfach: 322001
04081 Leipzig

Quicklinks:

Accessibility

Imprint

Privacy

About us

Research

Education

Transfer and Service

Living Lab