Home // LLM-Augmented Knowledge Graph Search & Question Answering with MCP Servers on SemOpenAlex

Supervisor

Prof. Dr.-Ing. Michael Färber

Chair of Scalable Software Architectures for Data Analytics

TUD Dresden University of Technology

michael.faerber@tu-dresden.de

A F M Mohimenul Joaa

Chair of Scalable Software Architectures for Data Analytics

TUD Dresden University of Technology

a_f_m_mohimenul.joaa@mailbox.tu-dresden.de

LLM-Augmented Knowledge Graph Search & Question Answering with MCP Servers on SemOpenAlex

Status: open / Type of Theses: Master theses / Location: Dresden

This thesis invites you to work at the intersection of Large Language Models (LLMs), Knowledge Graphs (KGs), and machine-actionable search APIs. At metaphacts, MCP (Model Context Protocol) servers expose powerful, LLM-driven tools for search, discovery, SPARQL execution, entity descriptions, and more — directly on top of large knowledge graphs such as SemOpenAlex and LinkedPapersWithCode (LPWC).

In this thesis, you will design and evaluate an LLM-based system that leverages MCP server endpoints for KG-encoded scholarly data. A central component will be integrating or extending KGNode, a subgraph extraction approach for LLM-based question answering. The system should enable high-quality KG question answering, intelligent subgraph retrieval, and LLM reasoning on top of large scholarly KGs — using MCP as the interface layer.

The thesis will be supervised at TU Dresden with close collaboration with metaphacts. The aim is a technically strong thesis with clear research contributions and realistic potential for a peer-reviewed publication.

What are the tasks?

Understand the MCP server ecosystem & scholarly KGs

Explore how MCP servers expose KG tools (entity search, description, SPARQL transactions).
Study SemOpenAlex and LPWC schemas, entity types, and link structures.
Analyze typical query and reasoning needs in the scientific domain (e.g., “Which papers connect topic X and method Y?”).

Extend or integrate KGNode for MCP-based KGQA

Work with KGNode, our subgraph extraction framework (semantic seed discovery + path-aware chain traversal).
Adapt KGNode to consume MCP endpoints for:
◦ entity lookup,
◦ hybrid search,
◦ relevant-path retrieval,
◦ SPARQL-based filtering.
Evaluate whether MCP’s toolset improves precision, coverage, or efficiency of subgraph extraction.

Build an LLM-augmented KGQA system

Design a pipeline where an LLM:
◦ interprets user questions,
◦ uses MCP tools to discover relevant nodes/paths,
◦ obtains a clean subgraph with KGNode or similar logic,
◦ answers the question grounded in KG evidence.
Experiment with different LLMs.
Explore techniques such as structured prompting, function calling, or self-refinement loops.

Explore additional MCP-enabled use cases (optional)

Intelligent entity exploration (“Explain this author’s research trajectory”).
KG-driven summarization (“Summarize the citations and influence of this paper”).
Dataset generation (splitting SemOpenAlex into topic-aligned subgraphs).

Evaluation & analysis

Use real scholarly queries to evaluate correctness, grounding quality, and relevance.
Compare MCP-supported retrieval against baseline KGNode or pure SPARQL approaches.
Analyze system behavior on different graph regions (dense vs. sparse).
Identify limitations caused by KG structure, MCP interface boundaries, or LLM reasoning.

What prerequisites do you need?

Strong interest in LLMs, Knowledge Graphs, and retrieval systems.
Good programming skills in Python (experience with APIs, transformers, or SPARQL is a plus).
Curiosity to design and evaluate intelligent AI-driven search systems.
Very good English for reading and writing.

Why this thesis is special

Real-world deployment: You will work with production-grade MCP servers used in enterprise and research environments.
Novel combination: The fusion of MCP tools + KGNode + SemOpenAlex has not been explored in the literature.
Impact in scientific ecosystems: High potential to power next-generation scholarly assistants (KGQA, discovery tools, explainers).
Publication potential: The topic sits at the intersection of LLM-based reasoning, KG retrieval, and system design.
Industry collaboration: Close exchange with metaphacts, including access to MCP setups and practical insight into knowledge-driven LLM tooling.

funded by:

Gefördert vom Bundesministerium für Bildung und Forschung.

ScaDS.AI Dresden/Leipzig (Center for Scalable Data Analytics and Artificial Intelligence) is a center for Data Science, Artificial Intelligence and Big Data with locations in Dresden and Leipzig.