φ解碼：適應性前瞻取樣實現推理時期的平衡探索與利用

摘要

推理時的最佳化透過擴展計算來推導出深思熟慮的推理步驟，以實現高效能。雖然先前的基於搜索的策略解決了自回歸生成的短視問題，但龐大的搜索空間導致過度探索和不足的利用。為了在推導最佳步驟時達到高效的平衡，我們將解碼策略框架化為前瞻採樣，利用模擬的未來步驟來獲得全局最優的步驟估計。基於此，我們提出了一種名為phi-Decoding的新穎解碼策略。為了提供精確且具表現力的步驟價值估計，phi-Decoding透過前瞻和聚類來近似兩個分佈。從聯合分佈中採樣，可以選擇最佳步驟進行利用。為了支持自適應的計算分配，我們提出了寬度和深度的剪枝策略，提供了一種輕量級解決方案以實現推理效率。在七個基準測試上的廣泛實驗表明，phi-Decoding在效能和效率上均優於強基線。額外的分析展示了其在各種大型語言模型上的泛化能力以及在廣泛計算預算範圍內的可擴展性。程式碼將發佈於https://github.com/xufangzhi/phi-Decoding，開源的PyPI套件即將推出。

English

Inference-time optimization scales computation to derive deliberate reasoning steps for effective performance. While previous search-based strategies address the short-sightedness of auto-regressive generation, the vast search space leads to excessive exploration and insufficient exploitation. To strike an efficient balance to derive the optimal step, we frame the decoding strategy as foresight sampling, leveraging simulated future steps to obtain globally optimal step estimation. Built on it, we propose a novel decoding strategy, named phi-Decoding. To provide a precise and expressive estimation of step value, phi-Decoding approximates two distributions via foresight and clustering. Sampling from the joint distribution, the optimal steps can be selected for exploitation. To support adaptive computation allocation, we propose in-width and in-depth pruning strategies, featuring a light-weight solution to achieve inference efficiency. Extensive experiments across seven benchmarks show phi-Decoding outperforms strong baselines in both performance and efficiency. Additional analysis demonstrates its generalization across various LLMs and scalability across a wide range of computing budgets. The code will be released at https://github.com/xufangzhi/phi-Decoding, and the open-source PyPI package is coming soon.

φ解碼：適應性前瞻取樣實現推理時期的平衡探索與利用

φ-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation

摘要

Support