Interpretability

We are interested in developing interpretable models. An interpretable model exposes means for identifying the process that leads from an input to a prediction. We are mainly focused on interpretability by design in text classification.

Current topics of interest:

Selective Rationalization:
The process of learning by providing highlights as explanations is denoted as selective rationalization. Highlights are a subset of input texts meant to be interpretable by a user and faithfully describe the inference process of a classification model. A popular architecture for selective rationalization is the Select-then-Predict Pipeline (SPP): a generator selects the rationale to be fed to a predictor. It has been shown that SPP suffers from local minima derived by suboptimal interplay between the generator and predictor, a phenomenon known as interlocking.

Knowledge Extraction:
The process of extracting interpretable knowledge from data-driven processes. Our aim is to distill common knowledge from several examples when addressing a downstream task.


Knowledge Extraction from Rationalization

Define ways to go from a local explanation (i.e., rationalization) to a global explanation (i.e., knowledge base) by aggregating and summarizing extracted rationales

Mixture of Experts for Rationalization

Understand whether we can develop a MoE model for selective rationalization to address interlocking.

Rationalization via LLMs

Evaluate LLM capabilities in performing selective rationalization via prompting

Structured Rationalization via Tree kernel methods

Applying rationalization in these contexts by also enforcing some structural constraints depending on the given scenario of application