Privacy-Preserving Record Linkage
Record linkage is the process of identifying and matching records that refer to the same entities (often people) across several databases. Generally, no unique entity identifiers (keys) are available and thereforethe linkage has to be based on available common attributes (such as names, addresses, dates of birth, etc.). Besides scalability to large databases (potential need to compare all pairs of records) and linkage quality (due to variations and errors attribute values, as well as changing and missing values), privacy is a major challenge in record linkage because often sensitive personal details are required for the linkage. The objective of privacy-preserving record linkage (PPRL) is to perform record linkage across organisations using masked (encoded) values such that besides certain attributes of the matched records no information about the sensitive source data can be learned by any party involved in the linking, or by a an external party. In this talk we will illustrate the significance of PPRL through several real-world scenarios, and introduce the concepts, techniques, algorithms, and research directions of PPRL, with a focus on methods and techniques that allow privacy-preserving linking of large databases in Big Data environments.
Peter Christen is a Professor at the Research School of Computer Science at the Australian National University. He received his Diploma in Computer Science Engineering from ETH Zurich in 1995 and his PhD in Computer Science from the University of Basel in 1999. His research interests are in data mining and data matching (record linkage). He has published over 140 articles in these areas, including in 2012 the book `Data Matching’ published by Springer. In 2015 he was co-editor of the book `Population Reconstruction’ also published by Springer. He is the principle developer of the Febrl (Freely Extensible Biomedical Record Linkage) open source data cleaning, de-duplication and record linkage system. Since 2006 Peter has been on the steering committee of the Australasian Data Mining (AusDM) conference series, which he organised in 2013. He is a regular reviewer for top tier data mining journal and conferences.