Construction of KG equally rich in structural patterns
Status: open / Type of Theses: Seminar Theses / Location: Dresden
Description
Real-world knowledge bases are inherently rich in diverse relational patterns. However, current pipelines for constructing knowledge graphs (KGs) often fail to preserve this diversity, leading to relational pattern bias. Such imbalances constrain the evaluation of KG embedding models, since their performance may be disproportionately conditioned by the prevalence of certain relation types (e.g., symmetric or compositional).
The aim of this project is to design a KG that is balanced across different relational patterns. Inspired by geometric KG embedding models, where relational properties correspond to elementary geometric transformations (EGTs)—Translation, Rotation, Reflection, and Scaling—we will establish a mapping between relational properties and these transformations (including their compositions). Based on this mapping, relations from Wikidata will be systematically identified, grouped, and extracted into categories. A connected subgraph will then be constructed, ensuring a comparable number of triples per category. The resulting KG will serve as a structurally balanced benchmark, mitigating relational pattern bias and supporting fairer and more robust evaluation of KGE models, which will be re-evaluated and ranked according to their performance on this new benchmark.
Thesis Objectives
- Survey: Review structural relational patterns in KGs (e.g., symmetry, anti-symmetry, inversion, composition, hierarchy) and their correspondence to EGTs.
- Mapping: Define a mapping between Wikidata relations and geometric transformations (Translation, Rotation, Reflection, Scaling, and compositions).
- Extraction: Design an automated pipeline to extract and categorize relations from Wikidata into balanced EGT groups.
- Construction: Build a connected subgraph with a controlled balance of triples across categories.
- Evaluation: Compare standard KG embedding models on the balanced KG to assess the impact of relational pattern bias.
Prerequisites
- Strong programming skills in Python (PyTorch preferred) with experience in data processing and graph libraries (e.g., networkx, PyTorch Geometric).
- Ability to query knowledge base using SPARQL and preprocess large-scale graph data.
- Familiarity with benchmark design and evaluation of knowledge graph embedding models.
References
- Bordes, Antoine, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. “Translating embeddings for modeling multi-relational data.” Advances in neural information processing systems 26 (2013).
- Sun, Zhiqing, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. “Rotate: Knowledge graph embedding by relational rotation in complex space.” arXiv preprint arXiv:1902.10197 (2019).
- Chami, Ines, Adva Wolf, Da-Cheng Juan, Frederic Sala, Sujith Ravi, and Christopher Ré. “Low-dimensional hyperbolic knowledge graph embeddings.” arXiv preprint arXiv:2005.00545 (2020).