接頭辞一貫性による信頼性の高い思考連鎖

要旨

大規模言語モデルは、複数の思考連鎖（Chain-of-Thought, CoT）トレースをサンプリングし、それらを多数決（MV）で集約するというテスト時手法（自己一貫性と呼ばれる）を用いることで、推論タスクの精度を向上させることが多い。CoTを途中で打ち切り、残りの部分を再生成すると、正解を含むトレースは誤答を含むトレースよりも、元の回答を再現する頻度が高いことが観察される。我々はこの差異を信頼性シグナル、すなわち「プレフィックス一貫性」として利用する。これは、各候補回答が再生成時にどれだけ頻繁に再現されるかによって重み付けを行うものである。この手法は、トークンの対数確率や自己評価プロンプトへのアクセスを必要としない。5つの推論モデルと4つの数学・科学ベンチマークにおいて、プレフィックス一貫性はほとんどの設定で最良の正解予測因子であり、これによる投票の再重み付けは、標準的な多数決の精度に達するまでに、最大で21倍少ないトークン数（中央値4.6倍）で済む。我々のコードはhttps://github.com/naoto-iwase/prefix-consistencyで公開されている。

English

Large Language Models often improve accuracy on reasoning tasks by sampling multiple Chain-of-Thought (CoT) traces and aggregating them with majority voting (MV), a test-time technique called self-consistency. When we truncate a CoT partway through and regenerate the remainder, we observe that traces with correct answers reproduce their original answer more often than traces with wrong answers. We use this difference as a reliability signal, prefix consistency, that weights each candidate answer by how often it reappears under regeneration. It requires no access to token log-probabilities or self-rating prompts. Across five reasoning models and four math and science benchmarks, prefix consistency is the best correctness predictor in most settings, and reweighting votes by it reaches Standard MV plateau accuracy at up to 21x fewer tokens (median 4.6x). Our code is available at https://github.com/naoto-iwase/prefix-consistency.

接頭辞一貫性による信頼性の高い思考連鎖

Reliable Chain-of-Thought via Prefix Consistency

要旨

Support