φ解碼:適應性前瞻取樣實現推理時期的平衡探索與利用
φ-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation
March 17, 2025
作者: Fangzhi Xu, Hang Yan, Chang Ma, Haiteng Zhao, Jun Liu, Qika Lin, Zhiyong Wu
cs.AI
摘要
推理時的最佳化透過擴展計算來推導出深思熟慮的推理步驟,以實現高效能。雖然先前的基於搜索的策略解決了自回歸生成的短視問題,但龐大的搜索空間導致過度探索和不足的利用。為了在推導最佳步驟時達到高效的平衡,我們將解碼策略框架化為前瞻採樣,利用模擬的未來步驟來獲得全局最優的步驟估計。基於此,我們提出了一種名為phi-Decoding的新穎解碼策略。為了提供精確且具表現力的步驟價值估計,phi-Decoding透過前瞻和聚類來近似兩個分佈。從聯合分佈中採樣,可以選擇最佳步驟進行利用。為了支持自適應的計算分配,我們提出了寬度和深度的剪枝策略,提供了一種輕量級解決方案以實現推理效率。在七個基準測試上的廣泛實驗表明,phi-Decoding在效能和效率上均優於強基線。額外的分析展示了其在各種大型語言模型上的泛化能力以及在廣泛計算預算範圍內的可擴展性。程式碼將發佈於https://github.com/xufangzhi/phi-Decoding,開源的PyPI套件即將推出。
English
Inference-time optimization scales computation to derive deliberate reasoning
steps for effective performance. While previous search-based strategies address
the short-sightedness of auto-regressive generation, the vast search space
leads to excessive exploration and insufficient exploitation. To strike an
efficient balance to derive the optimal step, we frame the decoding strategy as
foresight sampling, leveraging simulated future steps to obtain globally
optimal step estimation. Built on it, we propose a novel decoding strategy,
named phi-Decoding. To provide a precise and expressive estimation of step
value, phi-Decoding approximates two distributions via foresight and
clustering. Sampling from the joint distribution, the optimal steps can be
selected for exploitation. To support adaptive computation allocation, we
propose in-width and in-depth pruning strategies, featuring a light-weight
solution to achieve inference efficiency. Extensive experiments across seven
benchmarks show phi-Decoding outperforms strong baselines in both
performance and efficiency. Additional analysis demonstrates its generalization
across various LLMs and scalability across a wide range of computing budgets.
The code will be released at https://github.com/xufangzhi/phi-Decoding, and the
open-source PyPI package is coming soon.Summary
AI-Generated Summary