φ解码：自适应前瞻采样实现推理时探索与利用的平衡

摘要

推理时优化通过扩展计算来推导出深思熟虑的推理步骤，以实现高效性能。尽管先前的基于搜索的策略解决了自回归生成的短视问题，但庞大的搜索空间导致了过度探索和不足的利用。为了在推导最优步骤时达到高效平衡，我们将解码策略构建为前瞻采样，利用模拟的未来步骤来获得全局最优的步骤估计。在此基础上，我们提出了一种名为phi-Decoding的新型解码策略。为了提供精确且富有表现力的步骤价值估计，phi-Decoding通过前瞻和聚类来近似两个分布。从联合分布中采样，可以选择最优步骤进行利用。为了支持自适应计算分配，我们提出了宽度和深度剪枝策略，提供了一种轻量级解决方案以实现推理效率。在七个基准测试上的广泛实验表明，phi-Decoding在性能和效率上均优于强基线。进一步的分析展示了其在各种大语言模型上的泛化能力以及在广泛计算预算范围内的可扩展性。代码将在https://github.com/xufangzhi/phi-Decoding发布，开源PyPI包即将推出。

English

Inference-time optimization scales computation to derive deliberate reasoning steps for effective performance. While previous search-based strategies address the short-sightedness of auto-regressive generation, the vast search space leads to excessive exploration and insufficient exploitation. To strike an efficient balance to derive the optimal step, we frame the decoding strategy as foresight sampling, leveraging simulated future steps to obtain globally optimal step estimation. Built on it, we propose a novel decoding strategy, named phi-Decoding. To provide a precise and expressive estimation of step value, phi-Decoding approximates two distributions via foresight and clustering. Sampling from the joint distribution, the optimal steps can be selected for exploitation. To support adaptive computation allocation, we propose in-width and in-depth pruning strategies, featuring a light-weight solution to achieve inference efficiency. Extensive experiments across seven benchmarks show phi-Decoding outperforms strong baselines in both performance and efficiency. Additional analysis demonstrates its generalization across various LLMs and scalability across a wide range of computing budgets. The code will be released at https://github.com/xufangzhi/phi-Decoding, and the open-source PyPI package is coming soon.

φ解码：自适应前瞻采样实现推理时探索与利用的平衡

φ-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation

摘要

Support