JavaScript is required to use this site. Please enable JavaScript in your browser settings.

Supervisor

Explainable biomedical data modality translation with ontology-based cross-modal autoencoders

Status: open / Type of Theses: Master theses / Location: Leipzig

Deep generative models have gained increasing relevance in the biomedical sciences, where data is often collected across multiple modalities, such as gene expression, epigenetic profiles, and clinical phenotypes. Cross-modal variational autoencoders (VAEs) provide a principled framework for learning shared latent representations across such heterogeneous data sources and allow for the translation between modalities (Yang & Uhler, 2019). By mapping one data type into another, these models enable novel applications such as imputing missing modalities, integrating multi-omics datasets, and uncovering latent disease mechanisms.

Despite their success, cross-modal VAEs remain largely opaque, limiting their adoption in domains where biological interpretability is essential. This thesis proposes to address this challenge by developing ontology-based decoders that incorporate structured biological knowledge into the decoding process. Ontologies, such as the Gene Ontology or disease ontologies, encode hierarchical and semantic relationships between entities, making them well-suited to guide model outputs toward biologically meaningful reconstructions and translations. Embedding these structures into the decoder aims to enhance interpretability, reduce spurious correlations, and provide insights aligned with established biomedical knowledge.

The project will be implemented within AUTOENCODIX (https://github.com/jan-forest/autoencodix), an autoencoder framework developed by our research group. AUTOENCODIX offers modular components for training, evaluating, and extending autoencoders, thereby providing an ideal foundation for experimenting with ontology-augmented cross-modal VAEs. The thesis will involve (i) designing and integrating ontology-informed decoder architectures, (ii) benchmarking explainability and predictive performance against standard cross-modal VAEs, and (iii) evaluating the biological plausibility of the learned representations in selected biomedical case studies.

This work will contribute to advancing interpretable machine learning in computational biology, bridging the gap between powerful generative modeling and domain-driven explanatory requirements.

Student profile: Master Data Science/Bioinformatik/Informatik with knowledge and experience in Python Programming and PyTorch implementation. Basic understanding of cell biology and first experiences with molecular omics data.

funded by:
Gefördert vom Bundesministerium für Bildung und Forschung.
Gefördert vom Freistaat Sachsen.