為了提高LLM推理效率的獎勵導向的推測解碼

摘要

我們介紹了獎勵導向的推測解碼（RSD），這是一個旨在提高大型語言模型（LLMs）推理效率的新框架。RSD結合了一個輕量級的草稿模型和一個更強大的目標模型，通過控制偏差來優先考慮高獎勵輸出，與現有的推測解碼方法形成對比，後者強調嚴格的無偏見。RSD採用一個過程獎勵模型來評估中間解碼步驟，動態決定是否調用目標模型，從而優化計算成本和輸出質量之間的折衷。我們在理論上證明了一個基於閾值的混合策略實現了資源利用和性能之間的最佳平衡。對具有挑戰性的推理基準測試進行了廣泛評估，包括奧林匹克級任務，結果顯示RSD相對於僅使用目標模型進行解碼（FLOP減少高達4.4倍）取得了顯著的效率提升，同時平均精度顯著優於並行解碼方法（高達+3.5）。這些結果突出了RSD作為在資源密集型場景中部署LLMs的堅固且具成本效益的方法。

English

We introduce Reward-Guided Speculative Decoding (RSD), a novel framework aimed at improving the efficiency of inference in large language models (LLMs). RSD synergistically combines a lightweight draft model with a more powerful target model, incorporating a controlled bias to prioritize high-reward outputs, in contrast to existing speculative decoding methods that enforce strict unbiasedness. RSD employs a process reward model to evaluate intermediate decoding steps and dynamically decide whether to invoke the target model, optimizing the trade-off between computational cost and output quality. We theoretically demonstrate that a threshold-based mixture strategy achieves an optimal balance between resource utilization and performance. Extensive evaluations on challenging reasoning benchmarks, including Olympiad-level tasks, show that RSD delivers significant efficiency gains against decoding with the target model only (up to 4.4x fewer FLOPs), while achieving significant better accuracy than parallel decoding method on average (up to +3.5). These results highlight RSD as a robust and cost-effective approach for deploying LLMs in resource-intensive scenarios.

為了提高LLM推理效率的獎勵導向的推測解碼

Reward-Guided Speculative Decoding for Efficient LLM Reasoning

摘要

Support