將大型語言模型與進化算法相連結,產生強大的提示優化器。
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers
September 15, 2023
作者: Qingyan Guo, Rui Wang, Junliang Guo, Bei Li, Kaitao Song, Xu Tan, Guoqing Liu, Jiang Bian, Yujiu Yang
cs.AI
摘要
大型語言模型(LLMs)在各種任務中表現出色,但它們依賴精心設計的提示,這往往需要大量人力。為了自動化這個過程,本文提出了一個新的離散提示優化框架,稱為EvoPrompt,它借鑒了進化算法(EAs)的概念,因為它們表現良好且收斂速度快。為了讓EAs能夠處理自然語言表達的離散提示,這些提示需要具有連貫性和易讀性,我們將LLMs與EAs相連接。這種方法使我們能夠同時利用LLMs強大的語言處理能力和EAs高效的優化性能。具體來說,EvoPrompt不涉及任何梯度或參數,從提示的一組群體開始,並根據進化算子在發展集上改進群體,迭代生成新的提示。我們針對包括GPT-3.5和Alpaca在內的閉源和開源LLMs進行提示優化,涵蓋了涵蓋語言理解和生成任務的9個數據集。EvoPrompt在自動提示生成方面明顯優於人工設計的提示和現有方法,分別提高了25%和14%。此外,EvoPrompt表明將LLMs與EAs相結合創造了協同效應,這可能激發對LLMs和傳統算法組合的進一步研究。
English
Large Language Models (LLMs) excel in various tasks, but they rely on
carefully crafted prompts that often demand substantial human effort. To
automate this process, in this paper, we propose a novel framework for discrete
prompt optimization, called EvoPrompt, which borrows the idea of evolutionary
algorithms (EAs) as they exhibit good performance and fast convergence. To
enable EAs to work on discrete prompts, which are natural language expressions
that need to be coherent and human-readable, we connect LLMs with EAs. This
approach allows us to simultaneously leverage the powerful language processing
capabilities of LLMs and the efficient optimization performance of EAs.
Specifically, abstaining from any gradients or parameters, EvoPrompt starts
from a population of prompts and iteratively generates new prompts with LLMs
based on the evolutionary operators, improving the population based on the
development set. We optimize prompts for both closed- and open-source LLMs
including GPT-3.5 and Alpaca, on 9 datasets spanning language understanding and
generation tasks. EvoPrompt significantly outperforms human-engineered prompts
and existing methods for automatic prompt generation by up to 25% and 14%
respectively. Furthermore, EvoPrompt demonstrates that connecting LLMs with EAs
creates synergies, which could inspire further research on the combination of
LLMs and conventional algorithms.