Claim Detection in Sensitive and Multilingual Domains
Status: open / Type of Theses: Bachelor Theses, Master theses, PhD Theses / Location: Dresden
This project focuses on the automatic identification of verifiable, checkworthy claims in text — a foundational task in computational fact-checking and misinformation analysis. While claim detection has advanced significantly in English news and political discourse, it remains largely unexplored in sensitive domains (legal, clinical, financial) and low-resource languages.
A core challenge is that existing models treat claim detection as a simple binary classification task, ignoring the context in which a claim appears: who makes it, in what type of document, and for what audience. This project will investigate how contextual signals — document type, speaker role, domain ontology, and discourse structure — can improve both the accuracy and interpretability of claim detection models.
The student will survey existing datasets and benchmarks (e.g., ClaimBuster, CLEF CheckThat!), implement and compare baseline models ranging from fine-tuned transformer classifiers to LLM-based zero-shot approaches, and explore at least one of the following research directions:
- Domain adaptation: how well do models trained on political speech transfer to legal or clinical text?
- Multilingual claim detection: developing or extending datasets for Hebrew or other low-resource languages.
- Privacy-aware claim detection: investigating the interaction between text anonymization and claim detectability — specifically, whether masking or replacing named entities degrades a model’s ability to identify and rank claims.
- LLM-based detection: evaluating the checkworthiness ranking capabilities of instruction-tuned LLMs under zero-shot and few-shot prompting, with analysis of failure modes such as hallucination and inconsistency.
References
- Nakov, P., Barrón-Cedeño, A., Da San Martino, G., Alam, F., et al., 2022. Overview of the CLEF–2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Proceedings of CLEF 2022.
- Konstantinovskiy, L., Price, O., Babakar, M. and Zubiaga, A., 2021. Toward automated factchecking: Developing an annotation schema and benchmark for consistent automated claim detection. Transactions of the Association for Computational Linguistics, 9, pp.1219–1235.
- Wright, D. and Augenstein, I., 2021. Claim detection in biomedical Twitter posts. In Proceedings of the 20th Workshop on Biomedical Language Processing, pp. 131–142.
- Guo, Z., Schlichtkrull, M. and Vlachos, A., 2022. A survey on automated fact-checking. Transactions of the Association for Computational Linguistics, 10, pp.178–206.