思考連鎖の動態分析：能動的ガイダンスか、不誠実な事後合理化か？

要旨

最近の研究では、分析的推論や常識的推論といったソフト推論問題において、Chain-of-Thought（CoT）がもたらす効果が限定的であることが示されています。さらに、CoTはモデルの実際の推論プロセスに忠実でない場合もあります。本研究では、指示チューニングされたモデル、推論モデル、および推論蒸留モデルにおけるソフト推論タスクでのCoTの動態と忠実性を調査しました。その結果、これらのモデルがCoTに依存する方法に違いがあること、またCoTの影響力と忠実性が必ずしも一致しないことが明らかになりました。

English

Recent work has demonstrated that Chain-of-Thought (CoT) often yields limited gains for soft-reasoning problems such as analytical and commonsense reasoning. CoT can also be unfaithful to a model's actual reasoning. We investigate the dynamics and faithfulness of CoT in soft-reasoning tasks across instruction-tuned, reasoning and reasoning-distilled models. Our findings reveal differences in how these models rely on CoT, and show that CoT influence and faithfulness are not always aligned.