基於機率推論的方法,利用基於粒子的蒙特卡羅方法對LLM進行推論時的縮放
A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods
February 3, 2025
作者: Isha Puri, Shivchander Sudalairaj, Guangxuan Xu, Kai Xu, Akash Srivastava
cs.AI
摘要
大型語言模型(LLMs)通過擴大模型大小和/或數據已經取得顯著的性能提升。然而,最近的證據表明,這種方法存在收益遞減的問題,這促使我們將推斷時的計算量進行擴展。現有的推斷時擴展方法通常使用獎勵模型,將任務視為一個搜索問題,但由於獎勵模型中的近似誤差,這種方法往往容易受到獎勵欺騙的影響。在本文中,我們將推斷時的擴展視為一個概率推斷任務,並利用基於抽樣的技術來探索具有近似可能性的狀態空間模型的典型集合,而不是直接優化其模式。我們提出了一種新的推斷時擴展方法,通過將基於粒子的蒙特卡羅方法應用於此任務。我們的實證評估表明,我們的方法在各種具有挑戰性的數學推理任務上比我們的確定性搜索對應方法具有4-16倍更好的擴展速率。使用我們的方法,我們展示了Qwen2.5-Math-1.5B-Instruct在僅4次展開中就能超越GPT-4o的準確性,而Qwen2.5-Math-7B-Instruct在僅32次展開中就能達到o1級的準確性。我們的工作不僅提出了一種有效的推斷時擴展方法,還將概率推斷中豐富的文獻與LLMs的推斷時擴展相連接,以在未來工作中開發更加強健的算法。代碼和更多信息可在https://probabilistic-inference-scaling.github.io找到。
English
Large language models (LLMs) have achieved significant performance gains via
scaling up model sizes and/or data. However, recent evidence suggests
diminishing returns from such approaches, motivating scaling the computation
spent at inference time. Existing inference-time scaling methods, usually with
reward models, cast the task as a search problem, which tends to be vulnerable
to reward hacking as a consequence of approximation errors in reward models. In
this paper, we instead cast inference-time scaling as a probabilistic inference
task and leverage sampling-based techniques to explore the typical set of the
state distribution of a state-space model with an approximate likelihood,
rather than optimize for its mode directly. We propose a novel inference-time
scaling approach by adapting particle-based Monte Carlo methods to this task.
Our empirical evaluation demonstrates that our methods have a 4-16x better
scaling rate over our deterministic search counterparts on various challenging
mathematical reasoning tasks. Using our approach, we show that
Qwen2.5-Math-1.5B-Instruct can surpass GPT-4o accuracy in only 4 rollouts,
while Qwen2.5-Math-7B-Instruct scales to o1 level accuracy in only 32 rollouts.
Our work not only presents an effective method to inference-time scaling, but
also connects the rich literature in probabilistic inference with
inference-time scaling of LLMs to develop more robust algorithms in future
work. Code and further information is available at
https://probabilistic-inference-scaling.github.io.Summary
AI-Generated Summary