Home // Privacy-Preserving Variational Autoencoder for Single-Cell Analysis

Supervisor

Anika Hannemann

Leipzig University

anika.hannemann@informatik.uni-leipzig.de

Vincent David Friedrich

Leipzig University

vincent_david.friedrich@uni-leipzig.de

Privacy-Preserving Variational Autoencoder for Single-Cell Analysis

Status: at work / Type of Theses: Master theses / Location: Leipzig

Detecting diseases, assessing the health status of an individual or monitoring the response to drug treatment requires the understanding of biological processes on a single-cell level. Recent developments in cell sequencing techniques make it possible to cost-effectively isolate cells from heterogeneous tissues. Furthermore, gene expression can be precisely quantified on a single-cell level. At the core of single-cell data analysis lies the high-dimensional N x M gene expression matrix where N denotes the number of cells and M the number of genes. In order to capture most relevant structures and for interpretation purposes sophisticated dimensionality reduction methods for gene expression data are in demand.

Variational Autoencoders (VAEs) are a deep learning technique for learning latent representations of high-dimensional data.

They have been successfully applied to dimensionality reduction tasks in various application areas. The VAE architecture learns the latent space embedding of a dataset by training the network to ignore insignificant data (“noise”). As single-cell data contains high levels of technical and biological noise, VAEs are a promising approach for analyzing gene expression data.

Another emerging issue when working with biological or medical data is privacy: Human genomic data is the blueprint for each individual encoded in DNA and thus one the most private information. When a machine learning model is conventionally trained on private data, various attacks such as membership inference attacks might successfully derive information about the training data. This holds even true if the adversary has only limited access to the trained model or its outputs. However, in specific scenarios publishing a trained model or model outputs is necessary and unavoidable. To preserve the privacy of the training data, the goal of this Master Thesis is the development and implementation of a privacy-preserving VAE for dimensionality reduction of single-cell data.

Interests/Background:

Python
Interest in Machine Learning with application to Medicine/Biology
Interest in Privacy

funded by:

ScaDS.AI Dresden/Leipzig (Center for Scalable Data Analytics and Artificial Intelligence) is a center for Data Science, Artificial Intelligence and Big Data with locations in Dresden and Leipzig.