M.Sc. Thesis: Benchmarking of genome-wide association studies using homomorphic encryption

Type of thesis: Masterarbeit / location: Leipzig / Status of thesis: Open theses

Fully Homomorphic Encryption (HE) (Gentry, 2009) is an encryption scheme that permits encrypted data to be computed on without decryption, so the computation of any arbitrary algorithm can be allowed. Thus, all types of data analysis on private data can be safely performed using HE cryptosystems, as well as cloud-based computations and data management.

On private data, such as genomic data, this technique can be rather useful for data analysis and machine learning. Human genomic data is the blueprint for each individual encoded in DNA and thus one the most private information about human beings themselves as well as their relatives. With methods such as genomic-wide association study (GWAS) correlations between diseases and genetic variations can be found. GWAS compares genetic markers, such as single-nucleotide polymorphisms (SNP), to see if these variants are associated with a particular trait or disease using logistic regression. Often only a few genetic markers are sufficient to identify individuals uniquely with high accuracy, which raises questions about privacy and security. Especially, if data needs to be shared or combined across multiple institutions enabling larger study cohorts for rare diseases.

In this interdisciplinary thesis, GWAS is going to be performed on genomic data in a privacy preserving way using homomorphic encryption. Recently published tools and open source libraries are going to be used and evaluated against each other to provide a comparative benchmark.

Interests/Background:

  • Python/Data Analytics
  • Interest in Privacy and in Bioinformatics

Counterpart

Dr.
Jan Ewald

Service and Transfer Center

Universität Leipzig

bioinformatics, dynamical systems, machine learning

Anika Hannemann

Universität Leipzig

Privacy Preserving Machine Learning

TU
Universität
Max
Leibnitz-Institut
Helmholtz
Hemholtz