Multi-cultural Abusive and Hate Speech Detection

Mon, 02 Mar 2026 00:00:00 +0000

Description:
What is attributable as abusive or hate speech depends on the given socio-cultural context. The same text might be reputed offensive by a certain culture, allowed by another, and, in the most extreme case, legally prosecutable by a third one. Our aim is to evaluate how machine learning model are affected by different definitions of abusive and hate speech to promote awareness in developing accurate abusive speech detection systems.

Contact: Federico Ruggeri, Katerina Korre, Arianna Muti

References:

Untangling Hate Speech Definitions: A Semantic Componential Analysis Across Cultures and Domains.
Katerina Korre, Arianna Muti, Federico Ruggeri, and Alberto Barrón-Cedeño. 2025.
In Findings of the Association for Computational Linguistics: NAACL 2025, pages 3184–3198, Albuquerque, New Mexico. Association for Computational Linguistics.
DOI | PDF

Text Classification with Guidelines Only

Mon, 02 Mar 2026 00:00:00 +0000

Description:
The standard approach for training a machine learning model on a task is to provide an annotated dataset $(\mathcal{X}, \mathcal{Y})$. The dataset is built by providing unlabeled data $\mathcal{X}$ to a group of annotators previously trained on a set of annotation guidelines $\mathcal{G}$. Annotators label data $\mathcal{X}$ via a given class set $\mathcal{C}$. The main issue of this approach is that annotators define the mapping from data $\mathcal{X}$ to the class set $\mathcal{C}$ via the guidelines $\mathcal{G}$, while machine learning models are trained to learn the same mapping without guidelines $\mathcal{G}$. Consequently, these models can learn any kind of mapping from $\mathcal{X}$ to $\mathcal{C}$ that better fits given data. Our idea is to directly provide guidelines $\mathcal{G}$ to models without any access to class labels during training.

Contact: Federico Ruggeri

References:

Let Guidelines Guide You: A Prescriptive Guideline-Centered Data Annotation Methodology
Federico Ruggeri, Eleonora Misino, Arianna Muti, Katerina Korre, Paolo Torroni, Alberto Barrón-Cedeño
September 2024
PDF

Unstructured Knowledge Integration | Language Technologies Lab

Multi-cultural Abusive and Hate Speech Detection

Text Classification with Guidelines Only