POLCA: Ottimizzazione Generativa Stocastica con LLM

Abstract

L'ottimizzazione di sistemi complessi, che spaziano dai prompt per LLM ad agenti multi-turn, richiede tradizionalmente un'iterazione manuale laboriosa. Formalizziamo questa sfida come un problema di ottimizzazione generativa stocastica in cui un modello linguistico generativo agisce come ottimizzatore, guidato da ricompense numeriche e feedback testuali per scoprire il sistema migliore. Introduciamo Prioritized Optimization with Local Contextual Aggregation (POLCA), un framework scalabile progettato per gestire la stocasticità nell'ottimizzazione – come feedback rumoroso, campionamento di minibatch e comportamenti stocastici del sistema – gestendo efficacemente l'espansione non vincolata dello spazio delle soluzioni. POLCA mantiene una coda prioritaria per gestire il trade-off esplorazione-sfruttamento, monitorando sistematicamente le soluzioni candidate e le loro cronologie di valutazione. Per migliorare l'efficienza, integriamo un meccanismo ε-Net per mantenere la diversità dei parametri e un LLM Summarizer per eseguire meta-apprendimento tra le prove storiche. Dimostriamo teoricamente che POLCA converge a soluzioni candidate quasi ottimali in condizioni di stocasticità. Valutiamo il nostro framework su benchmark diversificati, tra cui τ-bench, HotpotQA (ottimizzazione di agenti), VeriBench (traduzione di codice) e KernelBench (generazione di kernel CUDA). I risultati sperimentali dimostrano che POLCA raggiunge prestazioni robuste, efficienti in termini di campioni e tempo, superando costantemente gli algoritmi all'avanguardia sia in problemi deterministici che stocastici. Il codice sorgente di questo lavoro è pubblicamente disponibile all'indirizzo https://github.com/rlx-lab/POLCA.

English

Optimizing complex systems, ranging from LLM prompts to multi-turn agents, traditionally requires labor-intensive manual iteration. We formalize this challenge as a stochastic generative optimization problem where a generative language model acts as the optimizer, guided by numerical rewards and text feedback to discover the best system. We introduce Prioritized Optimization with Local Contextual Aggregation (POLCA), a scalable framework designed to handle stochasticity in optimization -- such as noisy feedback, sampling minibatches, and stochastic system behaviors -- while effectively managing the unconstrained expansion of solution space. POLCA maintains a priority queue to manage the exploration-exploitation tradeoff, systematically tracking candidate solutions and their evaluation histories. To enhance efficiency, we integrate an varepsilon-Net mechanism to maintain parameter diversity and an LLM Summarizer to perform meta-learning across historical trials. We theoretically prove that POLCA converges to near-optimal candidate solutions under stochasticity. We evaluate our framework on diverse benchmarks, including τ-bench, HotpotQA (agent optimization), VeriBench (code translation) and KernelBench (CUDA kernel generation). Experimental results demonstrate that POLCA achieves robust, sample and time-efficient performance, consistently outperforming state-of-the-art algorithms in both deterministic and stochastic problems. The codebase for this work is publicly available at https://github.com/rlx-lab/POLCA.

POLCA: Ottimizzazione Generativa Stocastica con LLM

POLCA: Stochastic Generative Optimization with LLM

Abstract

Support