連鎖的思考推論における忠実性の測定

要旨

大規模言語モデル（LLM）は、質問に答える前に段階的な「連鎖的思考」（Chain-of-Thought, CoT）推論を生成する場合に性能が向上しますが、その推論がモデルの実際の推論プロセス（つまり、質問に答えるためのプロセス）を忠実に説明しているかどうかは不明です。本研究では、CoT推論が不忠実である可能性のある仮説を検証するため、CoTに介入（例えば、誤りを追加したり言い換えたりする）した際のモデルの予測の変化を調べました。その結果、モデルはタスクによってCoTにどの程度依存して答えを予測するかが大きく異なり、CoTに強く依存する場合もあれば、ほとんど無視する場合もあることがわかりました。CoTによる性能向上は、CoTが追加するテスト時の計算量だけによるものではなく、またCoTの特定の言い回しによってエンコードされた情報によるものでもないようです。モデルがより大規模で能力が高くなるにつれ、調査したほとんどのタスクにおいて、モデルはより不忠実な推論を生成するようになります。全体として、モデルのサイズやタスクを慎重に選択するなどの条件下では、CoTは忠実な推論を生成し得ることが示唆されました。

English

Large language models (LLMs) perform better when they produce step-by-step, "Chain-of-Thought" (CoT) reasoning before answering a question, but it is unclear if the stated reasoning is a faithful explanation of the model's actual reasoning (i.e., its process for answering the question). We investigate hypotheses for how CoT reasoning may be unfaithful, by examining how the model predictions change when we intervene on the CoT (e.g., by adding mistakes or paraphrasing it). Models show large variation across tasks in how strongly they condition on the CoT when predicting their answer, sometimes relying heavily on the CoT and other times primarily ignoring it. CoT's performance boost does not seem to come from CoT's added test-time compute alone or from information encoded via the particular phrasing of the CoT. As models become larger and more capable, they produce less faithful reasoning on most tasks we study. Overall, our results suggest that CoT can be faithful if the circumstances such as the model size and task are carefully chosen.

連鎖的思考推論における忠実性の測定

Measuring Faithfulness in Chain-of-Thought Reasoning

要旨

Support