Home // Research // Responsible AI // Projects // Privacy-Preserving Record Linkage

Contact

Prof. Dr. Erhard Rahm

Department of Computer Science, Database Group, Chair of Databases

Leipzig University

rahm@informatik.uni-leipzig.de

Privacy-Preserving Record Linkage

Title: Privacy-Preserving Record Linkage

Duration: 2014 – today

Research Area: Responsible AI

Record linkage is an essential component in many data integration tasks with multiple data sources. It aims to detect records that belong to the same real-world entity, such as a person. Typically, there is a lack of global identifiers; therefore the linkage can only be achieved by comparing available quasi-identifiers, such as name, address, or date of birth. However, often, data owners are only willing or allowed to provide their data for such data integration if there is sufficient protection of sensitive information to ensure the privacy of individuals, such as patients or customers. For example, in medical research, data of several sources (e.g., hospitals) has to be matched to investigate possible correlations between some diseases of the same patients without revealing the identity of patients.

Privacy Preserving Record Linkage (PPRL) addresses this problem and thereby enables the combination of sensitive data from different sources for improved data analysis and research.

Aims

The aim of this project is to study existing and develop new methods for Privacy Preserving Record Linkage that allow to match records while preserving their privacy. For this purpose, the linkage of person-related records is based on encoded values of the quasi-identifiers and the data needed for analysis (e.g., health data) is separated from these quasi-identifiers. The relevant data can be provided to a researcher without the identifying data.

Problem

PPRL is confronted with many challenges needing to be solved to ensure its practical applicability. In particular, a high degree of privacy has to be ensured by suitable encoding of sensitive data and organizational structures, such as the use of a trusted linkage unit. PPRL must achieve a high linkage quality by avoiding false or missing matches. Furthermore, a high efficiency with fast linkage time and scalability to large data volumes are needed.

Practical example

PPRL can be applied in many areas, such as public health, demographical studies and marketing analysis. We therefore developed an open-source toolbox for the flexible definition and execution of PPRL workflows: PRIMAT. It offers modules for data owners and the linkage unit that provide state-of-the-art PPRL methods, including various encoding and hardening techniques, LSH-based blocking, post-processing (clustering) and more.

Technology

We mainly focus on Bloom-Filter-based encodings which have been shown to allow for very efficient linkage of large databases while providing sufficient privacy protection. PRIMAT is implemented in Java and can be used via dockered Spring-Boot-based web services as well.

Outlook

In future work, we will focus on developing techniques that enable reliable high-quality linkage results on varying datasets and provide data custodians with performance indicators. We presume that those are essential for further real-world applications of PPRL.

Publications

Rohde, F., Franke, M., Christen, V., & Rahm, E. (2023). Value-specific Weighting for Record-level Encodings in Privacy-Preserving Record Linkage. BTW 2023.
Rohde, F., Franke, M., Sehili, Z., Lablans, M., & Rahm, E. (2021). Optimization of the Mainzelliste software for fast privacy-preserving record linkage. Journal of translational medicine, 19(1), 1-12.
Franke, M., Sehili, Z., Rohde, F., & Rahm, E. (2021). Evaluation of Hardening Techniques for Privacy-Preserving Record Linkage. In EDBT (pp. 289-300).
Sehili, Z., Rohde, F., Franke, M., & Rahm, E. (2021). Multi-party privacy preserving record linkage in dynamic metric space. BTW 2021.
Franke, M., Gladbach, M., Sehili, Z., Rohde, F., & Rahm, E. (2019). ScaDS research on scalable privacy-preserving record linkage. Datenbank-Spektrum, 19, 31-40.
Franke, M., Sehili, Z., & Rahm, E. (2019). Primat: a toolbox for fast privacy-preserving matching. Proceedings of the VLDB Endowment, 12(12), 1826-1829.
complete list of our PPRL : https://dbs.uni-leipzig.de/research/projects/pprl

Team

Lead

Prof. Dr. Erhard Rahm

Team Members

Florens Rohde
Victor Christen (DBS)
Martin Franke (DBS)

funded by:

Gefördert vom Bundesministerium für Bildung und Forschung.

ScaDS.AI Dresden/Leipzig (Center for Scalable Data Analytics and Artificial Intelligence) is a center for Data Science, Artificial Intelligence and Big Data with locations in Dresden and Leipzig.