Text Classification with Guidelines Only
Description:
The standard approach for training a machine learning model on a task is to provide an annotated dataset $(\mathcal{X}, \mathcal{Y})$.
The dataset is built by providing unlabeled data $\mathcal{X}$ to a group of annotators previously trained on a set of annotation guidelines $\mathcal{G}$.
Annotators label data $\mathcal{X}$ via a given class set $\mathcal{C}$.
The main issue of this approach is that annotators define the mapping from data $\mathcal{X}$ to the class set $\mathcal{C}$ via the guidelines $\mathcal{G}$, while machine learning models are trained to learn the same mapping without guidelines $\mathcal{G}$.
Consequently, these models can learn any kind of mapping from $\mathcal{X}$ to $\mathcal{C}$ that better fits given data.
Our idea is to directly provide guidelines $\mathcal{G}$ to models without any access to class labels during training.
Contact: Federico Ruggeri
References:
Let Guidelines Guide You: A Prescriptive Guideline-Centered Data Annotation Methodology
Federico Ruggeri, Eleonora Misino, Arianna Muti, Katerina Korre, Paolo Torroni, Alberto Barrón-Cedeño
September 2024
PDF