Title: Privacy-Preserving Machine Learning
Duration: ongoing since April 2020
Research Area: Responsible AI
Privacy-Preserving Machine Learning is a field at the intersection of machine learning and privacy, aiming to develop methodologies that enable the training and deployment of models without compromising the confidentiality of sensitive data. With the proliferation of data-driven technologies, concerns about privacy breaches have become paramount. Research in this domain employs cryptographic techniques such as homomorphic encryption and secure multi-party computation, anonymization methods, such as k-anonymity, and Differential Privacy (DP) to design algorithms that operate on encrypted or perturbed data. Additionally, federated learning, a decentralized approach, allows models to be trained across multiple devices without raw data leaving individual devices. The overarching goal is to strike a balance between the increasing demand for sophisticated machine learning models and the imperative to protect the privacy of individuals contributing to their training.
This project aims to advance Privacy-Preserving Machine Learning (PPML) by investigating existing techniques, analyzing privacy-utility trade-offs, developing new methods, testing their effectiveness, and addressing scalability. The goal is to balance model performance and data utility with privacy, promoting responsible AI and enabling wider adoption of machine learning in sensitive domains, such as personal health and location data.
A fundamental aspect in PPML is the inherent trade-off between utility and privacy. As protective measures like noise injection are implemented, they concurrently challenge the effectiveness of machine learning models. Finding a balance becomes crucial; aggressive privacy measures may result in bad model utility, diminishing its capabilities. Current techniques grapple with optimizing this trade-off, aiming at privacy-preserving techniques that mitigate risks while maintaining model performance.
PPML employs techniques like homomorphic encryption and secure multi-party computation for operating on encrypted data. Anonymization methods like k-anonymity and Differential Privacy perturb data to preserve privacy. Federated learning allows decentralized model training, keeping raw data on individual devices. Linking information from existing datasets or creating GANs can help in finding vulnerabilities but also in extending available amounts of data.
We aim to assess privacy risks in health and location data, e.g., from wearables or smartphones. We attempt to breach current anonymization methods with attacks, highlighting the need for privacy preservation. Furthermore, we explore noise injection as a means to protect personal sensitive data and counter re-identification. Additionally, we consider creating private synthetic datasets based on real data, improving accuracy and scalability of existing models.