DeepPrune:無跨軌跡冗餘的平行擴展
DeepPrune: Parallel Scaling without Inter-trace Redundancy
October 9, 2025
作者: Shangqing Tu, Yaxuan Li, Yushi Bai, Lei Hou, Juanzi Li
cs.AI
摘要
平行擴展已成為提升大型語言模型(LLMs)推理能力的一種強大範式,它通過同時生成多條思維鏈(CoT)軌跡來實現。然而,這種方法由於軌跡間的冗餘性帶來了顯著的計算效率問題——我們的分析顯示,超過80%的平行推理軌跡會產生相同的最終答案,這意味著大量的計算資源被浪費。為了解決這一關鍵的效率瓶頸,我們提出了DeepPrune,這是一個通過動態剪枝實現高效平行擴展的新框架。我們的方法包括一個專門訓練的判斷模型,該模型採用焦點損失和過採樣技術,能夠從部分推理軌跡中準確預測答案的等價性,在等價性預測上實現了0.87的AUROC,並結合一個在線貪心聚類算法,動態剪除冗餘路徑,同時保持答案的多樣性。在三個具有挑戰性的基準測試(AIME 2024、AIME 2025和GPQA)以及多種推理模型上的全面評估表明,DeepPrune在大多數情況下相比傳統的共識採樣實現了超過80%的token減少,同時保持了在3個百分點以內的競爭性準確率。我們的工作為高效的平行推理設立了新標準,使高性能推理更加高效。我們的代碼和數據可在這裡找到:https://deepprune.github.io/
English
Parallel scaling has emerged as a powerful paradigm to enhance reasoning
capabilities in large language models (LLMs) by generating multiple
Chain-of-Thought (CoT) traces simultaneously. However, this approach introduces
significant computational inefficiency due to inter-trace redundancy -- our
analysis reveals that over 80% of parallel reasoning traces yield identical
final answers, representing substantial wasted computation. To address this
critical efficiency bottleneck, we propose DeepPrune, a novel framework that
enables efficient parallel scaling through dynamic pruning. Our method features
a specialized judge model trained with focal loss and oversampling techniques
to accurately predict answer equivalence from partial reasoning traces which
realizes 0.87 AUROC on equivalence prediction, combined with an online greedy
clustering algorithm that dynamically prunes redundant paths while preserving
answer diversity. Comprehensive evaluations across three challenging benchmarks
(AIME 2024, AIME 2025, and GPQA) and multiple reasoning models demonstrate that
DeepPrune achieves remarkable token reduction by over 80% compared to
conventional consensus sampling on most cases, while maintaining competitive
accuracy within 3 percentage points. Our work establishes a new standard for
efficient parallel reasoning, making high-performance reasoning more efficient.
Our code and data are here: https://deepprune.github.io/