LLM-Augmented Knowledge Graph Search & Question Answering with MCP Servers on SemOpenAlex
Status: open / Type of Theses: Master theses / Location: Dresden
This thesis invites you to work at the intersection of Large Language Models (LLMs), Knowledge Graphs (KGs), and machine-actionable search APIs. At metaphacts, MCP (Model Context Protocol) servers expose powerful, LLM-driven tools for search, discovery, SPARQL execution, entity descriptions, and more — directly on top of large knowledge graphs such as SemOpenAlex and LinkedPapersWithCode (LPWC).
In this thesis, you will design and evaluate an LLM-based system that leverages MCP server endpoints for KG-encoded scholarly data. A central component will be integrating or extending KGNode, a subgraph extraction approach for LLM-based question answering. The system should enable high-quality KG question answering, intelligent subgraph retrieval, and LLM reasoning on top of large scholarly KGs — using MCP as the interface layer.
The thesis will be supervised at TU Dresden with close collaboration with metaphacts. The aim is a technically strong thesis with clear research contributions and realistic potential for a peer-reviewed publication.
What are the tasks?
Understand the MCP server ecosystem & scholarly KGs
- Explore how MCP servers expose KG tools (entity search, description, SPARQL transactions).
- Study SemOpenAlex and LPWC schemas, entity types, and link structures.
- Analyze typical query and reasoning needs in the scientific domain (e.g., “Which papers connect topic X and method Y?”).
Extend or integrate KGNode for MCP-based KGQA
- Work with KGNode, our subgraph extraction framework (semantic seed discovery + path-aware chain traversal).
- Adapt KGNode to consume MCP endpoints for:
◦ entity lookup,
◦ hybrid search,
◦ relevant-path retrieval,
◦ SPARQL-based filtering.
- Evaluate whether MCP’s toolset improves precision, coverage, or efficiency of subgraph extraction.
Build an LLM-augmented KGQA system
- Design a pipeline where an LLM:
◦ interprets user questions,
◦ uses MCP tools to discover relevant nodes/paths,
◦ obtains a clean subgraph with KGNode or similar logic,
◦ answers the question grounded in KG evidence.
- Experiment with different LLMs.
- Explore techniques such as structured prompting, function calling, or self-refinement loops.
Explore additional MCP-enabled use cases (optional)
- Intelligent entity exploration (“Explain this author’s research trajectory”).
- KG-driven summarization (“Summarize the citations and influence of this paper”).
- Dataset generation (splitting SemOpenAlex into topic-aligned subgraphs).
Evaluation & analysis
- Use real scholarly queries to evaluate correctness, grounding quality, and relevance.
- Compare MCP-supported retrieval against baseline KGNode or pure SPARQL approaches.
- Analyze system behavior on different graph regions (dense vs. sparse).
- Identify limitations caused by KG structure, MCP interface boundaries, or LLM reasoning.
What prerequisites do you need?
- Strong interest in LLMs, Knowledge Graphs, and retrieval systems.
- Good programming skills in Python (experience with APIs, transformers, or SPARQL is a plus).
- Curiosity to design and evaluate intelligent AI-driven search systems.
- Very good English for reading and writing.
Why this thesis is special
- Real-world deployment: You will work with production-grade MCP servers used in enterprise and research environments.
- Novel combination: The fusion of MCP tools + KGNode + SemOpenAlex has not been explored in the literature.
- Impact in scientific ecosystems: High potential to power next-generation scholarly assistants (KGQA, discovery tools, explainers).
- Publication potential: The topic sits at the intersection of LLM-based reasoning, KG retrieval, and system design.
- Industry collaboration: Close exchange with metaphacts, including access to MCP setups and practical insight into knowledge-driven LLM tooling.