斷裂的思維鏈推理
Fractured Chain-of-Thought Reasoning
May 19, 2025
作者: Baohao Liao, Hanze Dong, Yuhui Xu, Doyen Sahoo, Christof Monz, Junnan Li, Caiming Xiong
cs.AI
摘要
推理時期的縮放技術,通過在無需重新訓練的情況下,於推理階段投入額外的計算資源,顯著增強了大規模語言模型(LLMs)的推理能力。同樣地,思維鏈(Chain-of-Thought, CoT)提示及其延伸——長思維鏈(Long CoT),通過生成豐富的中間推理軌跡來提升準確性,但這些方法伴隨著高昂的令牌成本,阻礙了其在對延遲敏感的場景中的部署。本研究中,我們首先展示了截斷式思維鏈(truncated CoT),即在推理完成前停止並直接生成最終答案,往往能與完整思維鏈採樣相媲美,同時大幅減少令牌使用。基於這一發現,我們提出了分層採樣(Fractured Sampling),這是一種統一的推理時期策略,它在完整思維鏈與僅解答採樣之間沿著三個正交維度進行插值:(1)推理軌跡的數量,(2)每條軌跡最終解答的數量,以及(3)推理痕跡被截斷的深度。通過在五個多樣化的推理基準測試及多個模型規模上的廣泛實驗,我們證明了分層採樣在準確性與成本之間始終能達成更優的平衡,在Pass@k對令牌預算的關係中展現出陡峭的對數線性縮放增益。我們的分析揭示了如何在這幾個維度上分配計算資源以最大化性能,為實現更高效、可擴展的LLM推理鋪平了道路。
English
Inference-time scaling techniques have significantly bolstered the reasoning
capabilities of large language models (LLMs) by harnessing additional
computational effort at inference without retraining. Similarly,
Chain-of-Thought (CoT) prompting and its extension, Long CoT, improve accuracy
by generating rich intermediate reasoning trajectories, but these approaches
incur substantial token costs that impede their deployment in latency-sensitive
settings. In this work, we first show that truncated CoT, which stops reasoning
before completion and directly generates the final answer, often matches full
CoT sampling while using dramatically fewer tokens. Building on this insight,
we introduce Fractured Sampling, a unified inference-time strategy that
interpolates between full CoT and solution-only sampling along three orthogonal
axes: (1) the number of reasoning trajectories, (2) the number of final
solutions per trajectory, and (3) the depth at which reasoning traces are
truncated. Through extensive experiments on five diverse reasoning benchmarks
and several model scales, we demonstrate that Fractured Sampling consistently
achieves superior accuracy-cost trade-offs, yielding steep log-linear scaling
gains in Pass@k versus token budget. Our analysis reveals how to allocate
computation across these dimensions to maximize performance, paving the way for
more efficient and scalable LLM reasoning.Summary
AI-Generated Summary