基於前綴一致性的可靠思維鏈

摘要

大型語言模型常透過對多條思維鏈（CoT）軌跡進行取樣，並以多數投票（MV）進行聚合（即一種稱為自我一致性的測試時技術），來提升推理任務的準確率。當我們對思維鏈進行中途截斷並重新生成後續內容時，我們觀察到，正確答案的軌跡比錯誤答案的軌跡更常再現其原始答案。我們利用此差異作為可靠性信號，即前綴一致性，根據每個候選答案在重新生成時再次出現的頻率對其加權。此方法無需存取權杖對數機率或自我評分提示。在五個推理模型及四個數學與科學基準測試中，前綴一致性在多數情況下為最佳的正確性預測指標，透過其重新加權投票，可在標準多數投票達到平穩準確率時，最多減少高達21倍的權杖用量（中位數4.6倍）。我們的程式碼開源於 https://github.com/naoto-iwase/prefix-consistency。

English

Large Language Models often improve accuracy on reasoning tasks by sampling multiple Chain-of-Thought (CoT) traces and aggregating them with majority voting (MV), a test-time technique called self-consistency. When we truncate a CoT partway through and regenerate the remainder, we observe that traces with correct answers reproduce their original answer more often than traces with wrong answers. We use this difference as a reliability signal, prefix consistency, that weights each candidate answer by how often it reappears under regeneration. It requires no access to token log-probabilities or self-rating prompts. Across five reasoning models and four math and science benchmarks, prefix consistency is the best correctness predictor in most settings, and reweighting votes by it reaches Standard MV plateau accuracy at up to 21x fewer tokens (median 4.6x). Our code is available at https://github.com/naoto-iwase/prefix-consistency.

基於前綴一致性的可靠思維鏈

Reliable Chain-of-Thought via Prefix Consistency

摘要

Support