POLCA:基于大语言模型的随机生成式优化方法
POLCA: Stochastic Generative Optimization with LLM
March 16, 2026
作者: Xuanfei Ren, Allen Nie, Tengyang Xie, Ching-An Cheng
cs.AI
摘要
針對從大型語言模型提示到多輪智能體的複雜系統優化,傳統上需要耗費大量人力的手動迭代。我們將此挑戰形式化為一個隨機生成優化問題:生成式語言模型作為優化器,在數值獎勵和文本反饋的引導下探索最佳系統。我們提出具備局部上下文聚合的優先級優化框架(POLCA),該可擴展框架旨在處理優化過程中的隨機性(如噪聲反饋、採樣小批量數據和隨機系統行為),同時有效管理解空間的無限制擴張。POLCA通過維護優先級隊列來平衡探索與利用,系統化追蹤候選解及其評估歷史。為提升效率,我們整合了ε-Net機制以保持參數多樣性,並採用LLM摘要器實現歷史試驗的元學習。我們從理論上證明POLCA在隨機條件下能收斂至接近最優的候選解。通過在τ-bench、HotpotQA(智能體優化)、VeriBench(代碼翻譯)和KernelBench(CUDA內核生成)等多個基準測試上的評估,實驗結果表明POLCA在確定性與隨機性問題中均能實現魯棒、樣本高效且時間高效的性能,持續超越現有最先進算法。本研究代碼庫已公開於https://github.com/rlx-lab/POLCA。
English
Optimizing complex systems, ranging from LLM prompts to multi-turn agents, traditionally requires labor-intensive manual iteration. We formalize this challenge as a stochastic generative optimization problem where a generative language model acts as the optimizer, guided by numerical rewards and text feedback to discover the best system. We introduce Prioritized Optimization with Local Contextual Aggregation (POLCA), a scalable framework designed to handle stochasticity in optimization -- such as noisy feedback, sampling minibatches, and stochastic system behaviors -- while effectively managing the unconstrained expansion of solution space. POLCA maintains a priority queue to manage the exploration-exploitation tradeoff, systematically tracking candidate solutions and their evaluation histories. To enhance efficiency, we integrate an varepsilon-Net mechanism to maintain parameter diversity and an LLM Summarizer to perform meta-learning across historical trials. We theoretically prove that POLCA converges to near-optimal candidate solutions under stochasticity. We evaluate our framework on diverse benchmarks, including τ-bench, HotpotQA (agent optimization), VeriBench (code translation) and KernelBench (CUDA kernel generation). Experimental results demonstrate that POLCA achieves robust, sample and time-efficient performance, consistently outperforming state-of-the-art algorithms in both deterministic and stochastic problems. The codebase for this work is publicly available at https://github.com/rlx-lab/POLCA.