推論モデルは自身の確信度をより適切に表現する

要旨

大規模言語モデル（LLM）はその強力さにもかかわらず、自身の信頼度を正確に伝えることがしばしば困難であり、誤りを起こす可能性を評価しにくく、信頼性が制限される。本研究では、連鎖的思考（CoT）による推論を行う「推論モデル」が、問題解決だけでなく、信頼度を正確に表現する点でも優れた性能を示すことを実証する。具体的には、6つの推論モデルを6つのデータセットで評価し、36の設定のうち33において、非推論モデルよりも厳密に優れた信頼度較正を達成することを明らかにした。詳細な分析により、これらの較正の向上は、推論モデルが持つ「遅い思考」の行動—例えば代替アプローチの探索やバックトラッキングなど—に起因することが示された。これらの行動により、推論モデルはCoTの過程で信頼度を動的に調整し、次第に精度を高めることができる。特に、推論モデルはCoTが展開するにつれて信頼度較正が向上する傾向が見られるが、これは非推論モデルでは観察されない。さらに、CoTから遅い思考の行動を除去すると、較正が大幅に低下する。最後に、これらの利点は推論モデルに限定されないことを示し、非推論モデルも、文脈内学習を通じて遅い思考を行うよう誘導されると、同様の恩恵を受けることを確認した。

English

Despite their strengths, large language models (LLMs) often fail to communicate their confidence accurately, making it difficult to assess when they might be wrong and limiting their reliability. In this work, we demonstrate that reasoning models-LLMs that engage in extended chain-of-thought (CoT) reasoning-exhibit superior performance not only in problem-solving but also in accurately expressing their confidence. Specifically, we benchmark six reasoning models across six datasets and find that they achieve strictly better confidence calibration than their non-reasoning counterparts in 33 out of the 36 settings. Our detailed analysis reveals that these gains in calibration stem from the slow thinking behaviors of reasoning models-such as exploring alternative approaches and backtracking-which enable them to adjust their confidence dynamically throughout their CoT, making it progressively more accurate. In particular, we find that reasoning models become increasingly better calibrated as their CoT unfolds, a trend not observed in non-reasoning models. Moreover, removing slow thinking behaviors from the CoT leads to a significant drop in calibration. Lastly, we show that these gains are not exclusive to reasoning models-non-reasoning models also benefit when guided to perform slow thinking via in-context learning.

推論モデルは自身の確信度をより適切に表現する

Reasoning Models Better Express Their Confidence

要旨

Support