JavaScript is required to use this site. Please enable JavaScript in your browser settings.

Supervisor

LLM-Augmented Knowledge Graph Search & Question Answering with MCP Servers on SemOpenAlex

Status: open / Type of Theses: Master theses / Location: Dresden

This thesis invites you to work at the intersection of Large Language Models (LLMs), Knowledge Graphs (KGs), and machine-actionable search APIs. At metaphacts, MCP (Model Context Protocol) servers expose powerful, LLM-driven tools for search, discovery, SPARQL execution, entity descriptions, and more — directly on top of large knowledge graphs such as SemOpenAlex and LinkedPapersWithCode (LPWC).

In this thesis, you will design and evaluate an LLM-based system that leverages MCP server endpoints for KG-encoded scholarly data. A central component will be integrating or extending KGNode, a subgraph extraction approach for LLM-based question answering. The system should enable high-quality KG question answering, intelligent subgraph retrieval, and LLM reasoning on top of large scholarly KGs — using MCP as the interface layer.

The thesis will be supervised at TU Dresden with close collaboration with metaphacts. The aim is a technically strong thesis with clear research contributions and realistic potential for a peer-reviewed publication.

What are the tasks?

Understand the MCP server ecosystem & scholarly KGs

  • Explore how MCP servers expose KG tools (entity search, description, SPARQL transactions).
  • Study SemOpenAlex and LPWC schemas, entity types, and link structures.
  • Analyze typical query and reasoning needs in the scientific domain (e.g., “Which papers connect topic X and method Y?”).

Extend or integrate KGNode for MCP-based KGQA

  • Work with KGNode, our subgraph extraction framework (semantic seed discovery + path-aware chain traversal).
  • Adapt KGNode to consume MCP endpoints for:
    ◦ entity lookup,
    ◦ hybrid search,
    ◦ relevant-path retrieval,
    ◦ SPARQL-based filtering.
  • Evaluate whether MCP’s toolset improves precision, coverage, or efficiency of subgraph extraction.

Build an LLM-augmented KGQA system

  • Design a pipeline where an LLM:
    ◦ interprets user questions,
    ◦ uses MCP tools to discover relevant nodes/paths,
    ◦ obtains a clean subgraph with KGNode or similar logic,
    ◦ answers the question grounded in KG evidence.
  • Experiment with different LLMs.
  • Explore techniques such as structured prompting, function calling, or self-refinement loops.

Explore additional MCP-enabled use cases (optional)

  • Intelligent entity exploration (“Explain this author’s research trajectory”).
  • KG-driven summarization (“Summarize the citations and influence of this paper”).
  • Dataset generation (splitting SemOpenAlex into topic-aligned subgraphs).

Evaluation & analysis

  • Use real scholarly queries to evaluate correctness, grounding quality, and relevance.
  • Compare MCP-supported retrieval against baseline KGNode or pure SPARQL approaches.
  • Analyze system behavior on different graph regions (dense vs. sparse).
  • Identify limitations caused by KG structure, MCP interface boundaries, or LLM reasoning.

What prerequisites do you need?

  • Strong interest in LLMs, Knowledge Graphs, and retrieval systems.
  • Good programming skills in Python (experience with APIs, transformers, or SPARQL is a plus).
  • Curiosity to design and evaluate intelligent AI-driven search systems.
  • Very good English for reading and writing.

Why this thesis is special

  • Real-world deployment: You will work with production-grade MCP servers used in enterprise and research environments.
  • Novel combination: The fusion of MCP tools + KGNode + SemOpenAlex has not been explored in the literature.
  • Impact in scientific ecosystems: High potential to power next-generation scholarly assistants (KGQA, discovery tools, explainers).
  • Publication potential: The topic sits at the intersection of LLM-based reasoning, KG retrieval, and system design.
  • Industry collaboration: Close exchange with metaphacts, including access to MCP setups and practical insight into knowledge-driven LLM tooling.
funded by:
Gefördert vom Bundesministerium für Bildung und Forschung.
Gefördert vom Freistaat Sachsen.