Detecting and Mitigating Hate Speech with Large Language Models: Evaluating Strategies and Ethical Considerations
Status: open / Type of Theses: Bachelor Theses, Master theses / Location: Dresden
The proliferation of hate speech online poses significant challenges for automated systems aiming to detect and mitigate harmful content. Large Language Models (LLMs) have shown potential in recognizing hate speech, but issues such as bias, false positives, and the complexity of contextual understanding remain prevalent. Addressing these concerns requires a thorough evaluation of model strategies and a nuanced approach to ensuring both accuracy and fairness.
This thesis will investigate various methods to improve hate speech detection and explore counteractive strategies, focusing on different prompting techniques and model fine-tuning approaches. Additionally, the research will consider ethical implications, including how LLMs handle sensitive topics and the impact of biases on model performance.
Research Questions
- How do different prompting strategies affect the accuracy and fairness of hate speech detection by LLMs?
- What techniques can be employed to minimize false positives and improve context-aware detection of hate speech?
- How effective are counter-speech generation methods, and what are their limitations in reducing the impact of hate speech?
Prerequisites
- Strong programming skills in Python
- Familiarity with natural language processing and LLM architectures
- Interest in AI ethics and computational social science
References
- Sarah Masud, Sahajpreet Singh, Viktor Hangya, Alexander Fraser, and Tanmoy Chakraborty. 2024. Hate Personified: Investigating the role of LLMs in content moderation. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 15847–15863, Miami, Florida, USA. Association for Computational Linguistics.
- Nayeon Lee, Chani Jung, Junho Myung, Jiho Jin, Jose Camacho-Collados, Juho Kim, and Alice Oh. 2024. Exploring Cross-Cultural Differences in English Hate Speech Annotations: From Dataset Construction to Analysis. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 4205–4224, Mexico City, Mexico. Association for Computational Linguistics.