Home // Detecting and Mitigating Hate Speech with Large Language Models: Evaluating Strategies and Ethical Considerations

Supervisor

Shuzhou Yuan

Chair of Scalable Software Architectures for Data Analytics

TUD Dresden University of Technology

shuzhou.yuan@tu-dresden.de

Detecting and Mitigating Hate Speech with Large Language Models: Evaluating Strategies and Ethical Considerations

Status: open / Type of Theses: Bachelor Theses, Master theses / Location: Dresden

The proliferation of hate speech online poses significant challenges for automated systems aiming to detect and mitigate harmful content. Large Language Models (LLMs) have shown potential in recognizing hate speech, but issues such as bias, false positives, and the complexity of contextual understanding remain prevalent. Addressing these concerns requires a thorough evaluation of model strategies and a nuanced approach to ensuring both accuracy and fairness.

This thesis will investigate various methods to improve hate speech detection and explore counteractive strategies, focusing on different prompting techniques and model fine-tuning approaches. Additionally, the research will consider ethical implications, including how LLMs handle sensitive topics and the impact of biases on model performance.

Research Questions

How do different prompting strategies affect the accuracy and fairness of hate speech detection by LLMs?
What techniques can be employed to minimize false positives and improve context-aware detection of hate speech?
How effective are counter-speech generation methods, and what are their limitations in reducing the impact of hate speech?

Prerequisites

Strong programming skills in Python
Familiarity with natural language processing and LLM architectures
Interest in AI ethics and computational social science

References

Sarah Masud, Sahajpreet Singh, Viktor Hangya, Alexander Fraser, and Tanmoy Chakraborty. 2024. Hate Personified: Investigating the role of LLMs in content moderation. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 15847–15863, Miami, Florida, USA. Association for Computational Linguistics.
Nayeon Lee, Chani Jung, Junho Myung, Jiho Jin, Jose Camacho-Collados, Juho Kim, and Alice Oh. 2024. Exploring Cross-Cultural Differences in English Hate Speech Annotations: From Dataset Construction to Analysis. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 4205–4224, Mexico City, Mexico. Association for Computational Linguistics.

funded by:

Gefördert vom Bundesministerium für Bildung und Forschung.

ScaDS.AI Dresden/Leipzig (Center for Scalable Data Analytics and Artificial Intelligence) is a center for Data Science, Artificial Intelligence and Big Data with locations in Dresden and Leipzig.