Privacy-Preserving Variational Autoencoder for Single-Cell Analysis

Type of thesis: Masterarbeit / location: Leipzig / Status of thesis: Open theses

Detecting diseases, assessing the health status of an individual or monitoring the response to drug treatment requires the understanding of biological processes on a single-cell level. Recent developments in cell sequencing techniques make it possible to cost-effectively isolate cells from heterogeneous tissues. Furthermore, gene expression can be precisely quantified on a single-cell level. At the core of single-cell data analysis lies the high-dimensional N x M gene expression matrix where N denotes the number of cells and M the number of genes. In order to capture most relevant structures and for interpretation purposes sophisticated dimensionality reduction methods for gene expression data are in demand.

Variational Autoencoders (VAEs) are a deep learning technique for learning latent representations of high-dimensional data.

They have been successfully applied to dimensionality reduction tasks in various application areas. The VAE architecture learns the latent space embedding of a dataset by training the network to ignore insignificant data (“noise”). As single-cell data contains high levels of technical and biological noise, VAEs are a promising approach for analyzing gene expression data.

Another emerging issue when working with biological or medical data is privacy: Human genomic data is the blueprint for each individual encoded in DNA and thus one the most private information. When a machine learning model is conventionally trained on private data, various attacks such as membership inference attacks might successfully derive information about the training data. This holds even true if the adversary has only limited access to the trained model or its outputs. However, in specific scenarios publishing a trained model or model outputs is necessary and unavoidable. To preserve the privacy of the training data, the goal of this Master Thesis is the development and implementation of a privacy-preserving VAE for dimensionality reduction of single-cell data.


  • Python
  • Interest in Machine Learning with application to Medicine/Biology
  • Interest in Privacy


Anika Hannemann

Universität Leipzig

Privacy Preserving Machine Learning

Vincent David Friedrich


Universität Leipzig

Artificial Intelligence in Medicine