TF-IDF for Entity Resolution in Huge Knowledge graphs

Type of thesis: Bachelorarbeit / location: Leipzig / Status of thesis: Theses in progress

Entity Resolution (also known as Deduplication, Record Linkage, Link Discovery) refers to the task of identifying entities, which refer to the same real-world entity. Entities are usually matched by determining the similarity between them and this similarity is then used to determine if the entities are the same. One of these similarity measures is tf-idf (term frequency inverse document frequency).

This bachelor thesis consists of implementing tf-idf as similarity measure for FAMER(FAst Multi-source Entity Resolution system), a scalable framework for distributed multi-source entity resolution implemented with Apache Flink™ .

Contact: obraczka@informatik.uni-leipzig.de

Counterpart

Daniel Obraczka

Universität Leipzig

Knowledge Graphs

TU
Universität
Max
Leibnitz-Institut
Helmholtz
Hemholtz