鏈條推理中的忠實度評估

摘要

大型語言模型（LLMs）在回答問題之前，若能逐步進行“思維鏈”（Chain-of-Thought，CoT）推理，其表現會更好，但目前尚不清楚所述推理是否忠實地解釋了模型實際推理的過程（即回答問題的方式）。我們通過檢驗在介入CoT時模型預測如何改變的假設，來探討CoT推理可能不忠實的原因（例如，通過引入錯誤或改寫來介入）。模型在不同任務中對CoT的條件預測答案的程度存在較大差異，有時會嚴重依賴CoT，而其他時候則主要忽略它。CoT的性能提升似乎並非僅來自於CoT在測試時的計算量增加，或是通過CoT特定措辭所編碼的信息。隨著模型變得更大更強大，它們在我們研究的大多數任務中產生的推理越來越不忠實。總的來說，我們的結果表明，若精心選擇模型大小和任務等情況，CoT可以是忠實的。

English

Large language models (LLMs) perform better when they produce step-by-step, "Chain-of-Thought" (CoT) reasoning before answering a question, but it is unclear if the stated reasoning is a faithful explanation of the model's actual reasoning (i.e., its process for answering the question). We investigate hypotheses for how CoT reasoning may be unfaithful, by examining how the model predictions change when we intervene on the CoT (e.g., by adding mistakes or paraphrasing it). Models show large variation across tasks in how strongly they condition on the CoT when predicting their answer, sometimes relying heavily on the CoT and other times primarily ignoring it. CoT's performance boost does not seem to come from CoT's added test-time compute alone or from information encoded via the particular phrasing of the CoT. As models become larger and more capable, they produce less faithful reasoning on most tasks we study. Overall, our results suggest that CoT can be faithful if the circumstances such as the model size and task are carefully chosen.

鏈條推理中的忠實度評估

Measuring Faithfulness in Chain-of-Thought Reasoning

摘要

Support