Scalable and Accurate Decision-Tree Learning for Entity Resolution

Type of thesis: Bachelorarbeit / location: Leipzig / Status of thesis: Theses in progress

Entity Resolution (also known as Deduplication, Record Linkage, Link Discovery) refers to the task of identifying entities, which refer to the same real-world entity. Entities are usually matched by determining the similarity between them and this similarity is then used to determine if the entities are the same. With a plethora of different similarity measures and possibilities of combining them, creation good match conditions can be a cumbersome process of trial and error. This is why machine learning approaches are used to aid in this process.

This bachelor thesis consists of integrating the decision-tree based DRAGON algorithm into FAMER(FAst Multi-source Entity Resolution system), a scalable framework for distributed multi-source entity resolution implemented with Apache Flink™ .

Contact: obraczka@informatik.uni-leipzig.de

Counterpart

Daniel Obraczka

Universität Leipzig

Knowledge Graphs

TU
Universität
Max
Leibnitz-Institut
Helmholtz
Hemholtz