φ-Decoding: バランスの取れた推論時の探索と活用のための適応的先見サンプリング

要旨

推論時最適化は、効果的なパフォーマンスを得るために慎重な推論ステップを導出する計算をスケールします。これまでの探索ベースの戦略は、自己回帰生成の近視眼的な性質に対処してきましたが、広大な探索空間は過剰な探索と不十分な活用を引き起こします。最適なステップを導出するために効率的なバランスを取るため、我々はデコード戦略を先見サンプリングとして定式化し、シミュレートされた将来のステップを活用してグローバルに最適なステップ推定を取得します。これを基に、我々はphi-Decodingという新しいデコード戦略を提案します。ステップ値の正確で表現力豊かな推定を提供するために、phi-Decodingは先見とクラスタリングを通じて2つの分布を近似します。結合分布からサンプリングすることで、最適なステップを選択して活用することができます。適応的な計算割り当てをサポートするために、我々はin-widthおよびin-depthプルーニング戦略を提案し、推論効率を達成するための軽量なソリューションを特徴とします。7つのベンチマークにわたる広範な実験により、phi-Decodingが性能と効率の両面で強力なベースラインを上回ることが示されています。追加の分析により、様々な大規模言語モデル（LLM）にわたる汎用性と、広範な計算予算にわたるスケーラビリティが実証されています。コードはhttps://github.com/xufangzhi/phi-Decodingで公開され、オープンソースのPyPIパッケージも近日公開予定です。

English

Inference-time optimization scales computation to derive deliberate reasoning steps for effective performance. While previous search-based strategies address the short-sightedness of auto-regressive generation, the vast search space leads to excessive exploration and insufficient exploitation. To strike an efficient balance to derive the optimal step, we frame the decoding strategy as foresight sampling, leveraging simulated future steps to obtain globally optimal step estimation. Built on it, we propose a novel decoding strategy, named phi-Decoding. To provide a precise and expressive estimation of step value, phi-Decoding approximates two distributions via foresight and clustering. Sampling from the joint distribution, the optimal steps can be selected for exploitation. To support adaptive computation allocation, we propose in-width and in-depth pruning strategies, featuring a light-weight solution to achieve inference efficiency. Extensive experiments across seven benchmarks show phi-Decoding outperforms strong baselines in both performance and efficiency. Additional analysis demonstrates its generalization across various LLMs and scalability across a wide range of computing budgets. The code will be released at https://github.com/xufangzhi/phi-Decoding, and the open-source PyPI package is coming soon.

φ-Decoding: バランスの取れた推論時の探索と活用のための適応的先見サンプリング

φ-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation

要旨

Support