JavaScript is required to use this site. Please enable JavaScript in your browser settings.

Privacy-Preserving Machine Learning

Title: Privacy-Preserving Machine Learning

Duration: ongoing since April 2020

Research Area: Responsible AI

Privacy-Preserving Machine Learning is a field at the intersection of machine learning and privacy, aiming to develop methodologies that enable the training and deployment of models without compromising the confidentiality of sensitive data. With the proliferation of data-driven technologies, concerns about privacy breaches have become paramount. Research in this domain employs cryptographic techniques such as homomorphic encryption and secure multi-party computation, anonymization methods, such as k-anonymity, and Differential Privacy (DP) to design algorithms that operate on encrypted or perturbed data. Additionally, federated learning, a decentralized approach, allows models to be trained across multiple devices without raw data leaving individual devices. The overarching goal is to strike a balance between the increasing demand for sophisticated machine learning models and the imperative to protect the privacy of individuals contributing to their training.

Aims

This project aims to advance Privacy-Preserving Machine Learning (PPML) by investigating existing techniques, analyzing privacy-utility trade-offs, developing new methods, testing their effectiveness, and addressing scalability. The goal is to balance model performance and data utility with privacy, promoting responsible AI and enabling wider adoption of machine learning in sensitive domains, such as personal health and location data.

Problem

A fundamental aspect in PPML is the inherent trade-off between utility and privacy. As protective measures like noise injection are implemented, they concurrently challenge the effectiveness of machine learning models. Finding a balance becomes crucial; aggressive privacy measures may result in bad model utility, diminishing its capabilities. Current techniques grapple with optimizing this trade-off, aiming at privacy-preserving techniques that mitigate risks while maintaining model performance.

Technology

PPML employs techniques like homomorphic encryption and secure multi-party computation for operating on encrypted data. Anonymization methods like k-anonymity and Differential Privacy perturb data to preserve privacy. Federated learning allows decentralized model training, keeping raw data on individual devices. Linking information from existing datasets or creating GANs can help in finding vulnerabilities but also in extending available amounts of data.

Outlook

We aim to assess privacy risks in health and location data, e.g., from wearables or smartphones. We attempt to breach current anonymization methods with attacks, highlighting the need for privacy preservation. Furthermore, we explore noise injection as a means to protect personal sensitive data and counter re-identification. Additionally, we consider creating private synthetic datasets based on real data, improving accuracy and scalability of existing models.

Publications

  • Lange, Lucas, Nils Wenzlitschke, and Erhard Rahm. “Generating Synthetic Health Sensor Data for Privacy-Preserving Wearable Stress Detection.” arXiv, January, 2024. https://doi.org/10.48550/arXiv.2401.13327.
  • Lange, Lucas, Borislav Degenkolb, and Erhard Rahm. “Privacy-Preserving Stress Detection Using Smartwatch Health Data.” In 4. Interdisciplinary Privacy & Security at Large Workshop, INFORMATIK 2023. Gesellschaft für Informatik e.V., September, 2023. https://doi.org/10.18420/inf2023_66.
  • Vogel, Felix, Lucas Lange. “Privacy-Preserving Sentiment Analysis on Twitter.” In SKILL 2023. Gesellschaft für Informatik e.V., September, 2023. doi not yet available.
  • Lange, Lucas, Tobias Schreieder, Victor Christen, and Erhard Rahm. “Privacy at Risk: Exploiting Similarities in Health Data for Identity Inference.” arXiv, August, 2023. https://doi.org/10.48550/arXiv.2308.08310.
  • Lange, Lucas, Maja Schneider, Peter Christen, and Erhard Rahm. “Privacy in Practice: Private COVID-19 Detection in X-Ray Images.” In 20th International Conference on Security and Cryptography (SECRYPT 2023). SciTePress, July, 2023. https://doi.org/10.5220/0012048100003555.
  • Lange, Lucas, Maja Schneider, Peter Christen, and Erhard Rahm. “Privacy in Practice: Private COVID-19 Detection in X-Ray Images (Extended Version).” arXiv, November, 2022. https://doi.org/10.48550/arXiv.2211.11434.
  • Schneider, Maja, Lukas Gehrke, Peter Christen, and Erhard Rahm. „D-TOUR: Detour-based point of interest detection in privacy-sensitive trajectories“.  In 3. Interdisciplinary Privacy & Security at Large Workshop, INFORMATIK 2022. Gesellschaft für Informatik e.V., September, 2022. https://doi.org/10.18420/inf2022_20.
  • Schneider, Maja, Jonathan Schneider, Lea Löffelmann, Peter Christen, and Erhard Rahm. „Tuning the Utility-Privacy Trade-Off in Trajectory Data“. In 26th International Conference on Extending Database Technology (EDBT), 2023. http://dx.doi.org/10.48786/edbt.2023.78

Team

Lead

  • Prof. Dr. Erhard Rahm

Team Members

  • Maja Schneider
  • Lucas Lange
funded by:
Gefördert vom Bundesministerium für Bildung und Forschung.
Gefördert vom Freistaat Sachsen.