PPRL methods are essential for data integration tasks of sensitive data with multiple organizations, e.g., for combining patient information in secondary health research. Identifying information such as names and dates of birth are irreversibly encoded before being sent to a semi-trusted third party for comparison.
We build tools for defining and executing such PPRL workflows. See below some exemplary screenshots:
Selecting a suitable encoding technique is the main challenge for high-quality linkage results. The parametrization depends on the specific characteristics of the datasets to be linked. Therefore, we work on providing practical analysis tools to support the configuration process. See another screenshot below:
Although such linkages can be conducted fully automatically, the reliability of the outcome benefits from human feedback. In particular, the manual labeling of uncertain match candidates enables the evaluation and improvement of the predictions, e.g., by integrating active learning methods. Clerical review interfaces with masked display show only partial information to protect the identities.