DeepPrune：無跨軌跡冗餘的平行擴展

摘要

平行擴展已成為提升大型語言模型（LLMs）推理能力的一種強大範式，它通過同時生成多條思維鏈（CoT）軌跡來實現。然而，這種方法由於軌跡間的冗餘性帶來了顯著的計算效率問題——我們的分析顯示，超過80%的平行推理軌跡會產生相同的最終答案，這意味著大量的計算資源被浪費。為了解決這一關鍵的效率瓶頸，我們提出了DeepPrune，這是一個通過動態剪枝實現高效平行擴展的新框架。我們的方法包括一個專門訓練的判斷模型，該模型採用焦點損失和過採樣技術，能夠從部分推理軌跡中準確預測答案的等價性，在等價性預測上實現了0.87的AUROC，並結合一個在線貪心聚類算法，動態剪除冗餘路徑，同時保持答案的多樣性。在三個具有挑戰性的基準測試（AIME 2024、AIME 2025和GPQA）以及多種推理模型上的全面評估表明，DeepPrune在大多數情況下相比傳統的共識採樣實現了超過80%的token減少，同時保持了在3個百分點以內的競爭性準確率。我們的工作為高效的平行推理設立了新標準，使高性能推理更加高效。我們的代碼和數據可在這裡找到：https://deepprune.github.io/

English

Parallel scaling has emerged as a powerful paradigm to enhance reasoning capabilities in large language models (LLMs) by generating multiple Chain-of-Thought (CoT) traces simultaneously. However, this approach introduces significant computational inefficiency due to inter-trace redundancy -- our analysis reveals that over 80% of parallel reasoning traces yield identical final answers, representing substantial wasted computation. To address this critical efficiency bottleneck, we propose DeepPrune, a novel framework that enables efficient parallel scaling through dynamic pruning. Our method features a specialized judge model trained with focal loss and oversampling techniques to accurately predict answer equivalence from partial reasoning traces which realizes 0.87 AUROC on equivalence prediction, combined with an online greedy clustering algorithm that dynamically prunes redundant paths while preserving answer diversity. Comprehensive evaluations across three challenging benchmarks (AIME 2024, AIME 2025, and GPQA) and multiple reasoning models demonstrate that DeepPrune achieves remarkable token reduction by over 80% compared to conventional consensus sampling on most cases, while maintaining competitive accuracy within 3 percentage points. Our work establishes a new standard for efficient parallel reasoning, making high-performance reasoning more efficient. Our code and data are here: https://deepprune.github.io/

DeepPrune：無跨軌跡冗餘的平行擴展

DeepPrune: Parallel Scaling without Inter-trace Redundancy

摘要

Support