Home // Explainable Variational Autoencoders for Genetic Research: A Comparative Study of Post-hoc XAI Methods

Supervisor

Dr. Jan Ewald

Leipzig University

jan.ewald@uni-leipzig.de

Author

Isabella Kreller

Explainable Variational Autoencoders for Genetic Research: A Comparative Study of Post-hoc XAI Methods

Status: finished / Type of Theses: Master theses / Location: Leipzig

Despite ongoing medical progress, a substantial number of people worldwide suffer from diseases, with genetic disorders such as cancer and cystic fibrosis (CF) presenting particularly complex challenges. These conditions often exhibit high heterogeneity, making reliable biomarker identification crucial for diagnosis, treatment prediction, and personalized medicine. Autoencoders show promise in extracting meaningful patterns from such data, but their opacity limits research potential. Explainable AI (XAI) addresses this issue by enhancing model transparency. This study investigated the potential of explainable β-Variational Autoencoders (β-VAE) for disease biomarker discovery, using a model aggregation approach to improve result reliability. Currently, there is limited guidance on selecting the optimal XAI method for VAEs, and no prior research has comparedhow variations in β-values influence feature importance in the context of XAI. Thus, three post-hoc explainability methods—DeepLiftShap, LIME, and Integrated Gradients—were compared across two disease datasets of varying complexities, focusing on CF and cancer. To assess the impact of latent space disentanglement, the analysis was conducted on two model variants trainedwith β = 0.01 and β = 1, respectively. To avoid confirmation bias, the top genes were then biologically validated through comparison with established disease markers before being analyzed in relation to their functionality. The results show that DeepLiftShap and Integrated Gradients largely agreed, providing consistent feature importance rankings, while LIME’s resultsdiverged. They also revealed that the lower β-value was more effective, especially on the single-cell CF dataset, prioritizing established disease markers. Biologically, the methods identified known disease markers and genes with strong literature support or tissue connections to the target cancers. These findings suggest the general suitability of β-VAEs for biomarker discovery,though further validation to exclude false positives is needed to ensure reliability. Future work should focus on aggregating results across different model instances or configurations, and on researching combining post-hoc methods with intrinsically interpretable models.

funded by:

Gefördert vom Bundesministerium für Bildung und Forschung.

ScaDS.AI Dresden/Leipzig (Center for Scalable Data Analytics and Artificial Intelligence) is a center for Data Science, Artificial Intelligence and Big Data with locations in Dresden and Leipzig.

Dresden

Visitor address Technische Universität Dresden
ScaDS.AI Dresden/Leipzig
Bürogebäude Strehlener Straße
Strehlener Straße 12, 14
01069 Dresden

Postal address Technische Universität Dresden
Zentrum für Informationsdienste und Hochleistungsrechnen
ScaDS.AI Dresden/Leipzig
01062 Dresden

Leipzig

Visitor address ScaDS.AI Dresden/Leipzig
Löhrs Carré
Humboldtstraße 25, Uferstr. 11
04105 Leipzig

Postal address Universität Leipzig
Data Science Center ScaDS.AI Leipzig
Internes Postfach: 322001
04081 Leipzig

Quicklinks:

Accessibility

Imprint

Privacy

About us

Research

Education

Transfer and Service

Living Lab