Home // Exploring Citation Generation with Large Language Models: A Comparative Study of Prompting Strategies

Supervisor

Tobias Schreieder

Chair of Scalable Software Architectures for Data Analytics

TUD Dresden University of Technology

tobias.schreieder@tu-dresden.de

Exploring Citation Generation with Large Language Models: A Comparative Study of Prompting Strategies

Status: at work / Type of Theses: Master theses / Location: Dresden

Large Language Models (LLMs) have revolutionized natural language processing tasks, including automated question answering, summarization, and content generation. One critical aspect for ensuring the trustworthiness of these models is citation generation—the ability of LLMs to provide accurate and relevant references to support their outputs. However, this ability remains a challenge, particularly due to the lack of direct access to up-to-date and verified knowledge sources. Citation generation requires not only the generation of fluent text but also a strong alignment between the content produced and verifiable external sources.

This thesis focuses on the implementation and evaluation of different prompting strategies for citation generation with LLMs, comparing techniques such as zero-shot prompting, few-shot prompting, and chain-of-verification prompting. The study will further explore the impact of Retrieval-Augmented Generation (RAG), where external documents (e.g. scientific papers) are incorporated to assist LLMs in generating more precise and verifiable citations.

Research Questions

How do different prompting strategies impact the accuracy of citations generated by LLMs?
What is the effect of integrating RAG on the citation generation capabilities of LLMs?
How do different LLM architectures compare in their ability to generate correct and relevant citations?

Prerequisites

Very good programming skills in Python
Basic understanding of RAG

References

[1] Gao, T., Yen, H., Yu, J., Chen, D.: Enabling large language models to generate text with citations. In: Proceedings of the EMNLP 2023. pp. 6465–6488. (2023), https://aclanthology.org/2023.emnlp-main.398

[2] Dhuliawala, S., Komeili, M., Xu, J., Raileanu, R., Li, X., Celikyilmaz, A., Weston, J.: Chain-of-verification reduces hallucination in large language models. In: Findings of the ACL 2024. pp. 3563–3578. (2024), https://aclanthology.org/2024.findings-acl.212

[3] Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.t., Rocktäschel, T., Riedel, S., Kiela, D.: Retrieval-augmented generation for knowledge-intensive nlp tasks. In: Advances in NeurIPS. vol. 33, pp. 9459–9474. (2020), https://proceedings.neurips.cc/paper_files/paper/2020/file/6b493230205f780e1bc26945df7481e5-Paper.pdf

funded by:

Gefördert vom Bundesministerium für Bildung und Forschung.

ScaDS.AI Dresden/Leipzig (Center for Scalable Data Analytics and Artificial Intelligence) is a center for Data Science, Artificial Intelligence and Big Data with locations in Dresden and Leipzig.