基于前缀一致性的可靠思维链

摘要

大语言模型通常通过采样多条思维链（CoT）轨迹并采用多数投票（MV）进行聚合（一种称为自一致性的测试时技术）来提高推理任务的准确率。当我们截断部分CoT并重新生成剩余部分时，观察到正确答案的轨迹比错误答案的轨迹更常重现原始答案。我们利用这一差异作为可靠性信号——前缀一致性，它通过每个候选答案在重新生成下的重现频率来为其赋予权重。该方法无需访问token对数概率或自我评分提示。在五个推理模型以及四个数学与科学基准测试中，前缀一致性在多数场景下是最优的正确性预测指标，通过其重新加权投票可在token消耗最多减少21倍（中位数4.6倍）的情况下达到标准多数投票的准确率平台期。我们的代码已开源：https://github.com/naoto-iwase/prefix-consistency。

English

Large Language Models often improve accuracy on reasoning tasks by sampling multiple Chain-of-Thought (CoT) traces and aggregating them with majority voting (MV), a test-time technique called self-consistency. When we truncate a CoT partway through and regenerate the remainder, we observe that traces with correct answers reproduce their original answer more often than traces with wrong answers. We use this difference as a reliability signal, prefix consistency, that weights each candidate answer by how often it reappears under regeneration. It requires no access to token log-probabilities or self-rating prompts. Across five reasoning models and four math and science benchmarks, prefix consistency is the best correctness predictor in most settings, and reweighting votes by it reaches Standard MV plateau accuracy at up to 21x fewer tokens (median 4.6x). Our code is available at https://github.com/naoto-iwase/prefix-consistency.

基于前缀一致性的可靠思维链

Reliable Chain-of-Thought via Prefix Consistency

摘要

Support