Spiegare le Fonti di Incertezza nel Fact-Checking Automatico

Abstract

Comprendere le fonti dell'incertezza di un modello riguardo alle sue previsioni è cruciale per una collaborazione efficace tra uomo e intelligenza artificiale. I lavori precedenti propongono l'uso di incertezza numerica o espressioni attenuanti ("Non sono sicuro, ma..."), che non spiegano l'incertezza derivante da prove contrastanti, lasciando gli utenti incapaci di risolvere disaccordi o fare affidamento sull'output. Introduciamo CLUE (Conflict-and-Agreement-aware Language-model Uncertainty Explanations), il primo framework in grado di generare spiegazioni in linguaggio naturale dell'incertezza del modello, attraverso (i) l'identificazione di relazioni tra porzioni di testo che rivelano conflitti o accordi tra affermazioni e prove o tra prove stesse, che guidano l'incertezza predittiva del modello in modo non supervisionato, e (ii) la generazione di spiegazioni tramite prompting e steering dell'attenzione che verbalizzano queste interazioni critiche. Su tre modelli linguistici e due dataset di fact-checking, dimostriamo che CLUE produce spiegazioni più fedeli all'incertezza del modello e più coerenti con le decisioni di fact-checking rispetto al prompting per spiegazioni di incertezza senza guida sulle interazioni tra porzioni di testo. I valutatori umani giudicano le nostre spiegazioni più utili, più informative, meno ridondanti e più logicamente coerenti con l'input rispetto a questa baseline. CLUE non richiede fine-tuning o modifiche architetturali, rendendolo plug-and-play per qualsiasi modello linguistico white-box. Collegando esplicitamente l'incertezza ai conflitti di prove, offre un supporto pratico per il fact-checking e si generalizza facilmente ad altri compiti che richiedono ragionamento su informazioni complesse.

English

Understanding sources of a model's uncertainty regarding its predictions is crucial for effective human-AI collaboration. Prior work proposes using numerical uncertainty or hedges ("I'm not sure, but ..."), which do not explain uncertainty that arises from conflicting evidence, leaving users unable to resolve disagreements or rely on the output. We introduce CLUE (Conflict-and-Agreement-aware Language-model Uncertainty Explanations), the first framework to generate natural language explanations of model uncertainty by (i) identifying relationships between spans of text that expose claim-evidence or inter-evidence conflicts and agreements that drive the model's predictive uncertainty in an unsupervised way, and (ii) generating explanations via prompting and attention steering that verbalize these critical interactions. Across three language models and two fact-checking datasets, we show that CLUE produces explanations that are more faithful to the model's uncertainty and more consistent with fact-checking decisions than prompting for uncertainty explanations without span-interaction guidance. Human evaluators judge our explanations to be more helpful, more informative, less redundant, and more logically consistent with the input than this baseline. CLUE requires no fine-tuning or architectural changes, making it plug-and-play for any white-box language model. By explicitly linking uncertainty to evidence conflicts, it offers practical support for fact-checking and generalises readily to other tasks that require reasoning over complex information.

Spiegare le Fonti di Incertezza nel Fact-Checking Automatico

Explaining Sources of Uncertainty in Automated Fact-Checking

Abstract

Support