自動ファクトチェックにおける不確実性の源泉の説明

要旨

モデルの予測に対する不確実性の源を理解することは、人間とAIの効果的な協力にとって重要である。これまでの研究では、数値的な不確実性やヘッジ表現（「確信はありませんが...」など）を用いることが提案されてきたが、これらは対立する証拠から生じる不確実性を説明せず、ユーザーが意見の相違を解決したり出力を信頼したりすることを妨げている。本研究では、CLUE（Conflict-and-Agreement-aware Language-model Uncertainty Explanations）を導入する。これは、(i) モデルの予測的不確実性を駆動する主張と証拠、または証拠間の対立や一致を明らかにするテキストスパン間の関係を教師なしで特定し、(ii) これらの重要な相互作用を言語化する説明をプロンプティングとアテンション・ステアリングによって生成する、初めてのフレームワークである。3つの言語モデルと2つのファクトチェックデータセットを用いた実験を通じて、CLUEが生成する説明は、スパン間相互作用のガイダンスなしで不確実性の説明を求める場合と比べて、モデルの不確実性により忠実で、ファクトチェックの決定とより一貫していることを示す。人間の評価者は、我々の説明がより役立ち、情報量が多く、冗長性が少なく、入力と論理的に一貫していると判断した。CLUEは、ファインチューニングやアーキテクチャの変更を必要とせず、任意のホワイトボックス言語モデルにプラグアンドプレイで適用可能である。不確実性を証拠の対立に明示的に結びつけることで、ファクトチェックを実践的に支援し、複雑な情報に基づく推論を必要とする他のタスクにも容易に一般化できる。

English

Understanding sources of a model's uncertainty regarding its predictions is crucial for effective human-AI collaboration. Prior work proposes using numerical uncertainty or hedges ("I'm not sure, but ..."), which do not explain uncertainty that arises from conflicting evidence, leaving users unable to resolve disagreements or rely on the output. We introduce CLUE (Conflict-and-Agreement-aware Language-model Uncertainty Explanations), the first framework to generate natural language explanations of model uncertainty by (i) identifying relationships between spans of text that expose claim-evidence or inter-evidence conflicts and agreements that drive the model's predictive uncertainty in an unsupervised way, and (ii) generating explanations via prompting and attention steering that verbalize these critical interactions. Across three language models and two fact-checking datasets, we show that CLUE produces explanations that are more faithful to the model's uncertainty and more consistent with fact-checking decisions than prompting for uncertainty explanations without span-interaction guidance. Human evaluators judge our explanations to be more helpful, more informative, less redundant, and more logically consistent with the input than this baseline. CLUE requires no fine-tuning or architectural changes, making it plug-and-play for any white-box language model. By explicitly linking uncertainty to evidence conflicts, it offers practical support for fact-checking and generalises readily to other tasks that require reasoning over complex information.

自動ファクトチェックにおける不確実性の源泉の説明

Explaining Sources of Uncertainty in Automated Fact-Checking

要旨

Support