POLCA: 大規模言語モデルを用いた確率的生成最適化

要旨

大規模言語モデル（LLM）のプロンプトからマルチターンエージェントに至るまで、複雑なシステムの最適化は、従来、人的労力を要する手動の反復作業を必要としてきた。本研究では、この課題を確率的生成最適化問題として形式化し、生成言語モデルがオプティマイザとして機能し、数値的な報酬とテキストフィードバックに導かれて最適なシステムを発見する枠組みを提案する。我々は、優先度付きキューを用いて探索と利用のトレードオフを管理し、候補解とその評価履歴を体系的に追跡する、スケーラブルなフレームワークであるPOLCA（Prioritized Optimization with Local Contextual Aggregation）を導入する。本フレームワークは、ノイジーなフィードバック、ミニバッチサンプリング、確率的システム挙動といった最適化における確率性を扱いながら、解空間の制約のない拡大を効果的に管理するように設計されている。効率性を高めるため、パラメータ多様性を維持するε-Netメカニズムと、過去の試行を跨ぐメタ学習を行うLLM要約器を統合している。理論的には、POLCAが確率性の下で最適に近い候補解に収束することを証明する。我々は、τ-bench、HotpotQA（エージェント最適化）、VeriBench（コード翻訳）、KernelBench（CUDAカーネル生成）を含む多様なベンチマークで本フレームワークを評価した。実験結果は、POLCAがロバストで、サンプル効率および時間効率に優れた性能を達成し、決定論的および確率的問題の両方において、最先端のアルゴリズムを一貫して上回ることを示している。本研究成果のコードベースはhttps://github.com/rlx-lab/POLCA で公開されている。

English

Optimizing complex systems, ranging from LLM prompts to multi-turn agents, traditionally requires labor-intensive manual iteration. We formalize this challenge as a stochastic generative optimization problem where a generative language model acts as the optimizer, guided by numerical rewards and text feedback to discover the best system. We introduce Prioritized Optimization with Local Contextual Aggregation (POLCA), a scalable framework designed to handle stochasticity in optimization -- such as noisy feedback, sampling minibatches, and stochastic system behaviors -- while effectively managing the unconstrained expansion of solution space. POLCA maintains a priority queue to manage the exploration-exploitation tradeoff, systematically tracking candidate solutions and their evaluation histories. To enhance efficiency, we integrate an varepsilon-Net mechanism to maintain parameter diversity and an LLM Summarizer to perform meta-learning across historical trials. We theoretically prove that POLCA converges to near-optimal candidate solutions under stochasticity. We evaluate our framework on diverse benchmarks, including τ-bench, HotpotQA (agent optimization), VeriBench (code translation) and KernelBench (CUDA kernel generation). Experimental results demonstrate that POLCA achieves robust, sample and time-efficient performance, consistently outperforming state-of-the-art algorithms in both deterministic and stochastic problems. The codebase for this work is publicly available at https://github.com/rlx-lab/POLCA.

POLCA: 大規模言語モデルを用いた確率的生成最適化

POLCA: Stochastic Generative Optimization with LLM

要旨

Support