大型语言模型能够进行上下文探索吗？

摘要

我们调查了当代大型语言模型（LLMs）在探索方面的能力，这是强化学习和决策制定中的核心能力。我们专注于现有LLMs的本机性能，而不进行训练干预。我们将LLMs部署为简单的多臂老虎机环境中的代理程序，完全在上下文中指定环境描述和交互历史，即在LLM提示中。我们尝试了GPT-3.5、GPT-4和Llama2，使用各种提示设计，发现这些模型在没有实质干预的情况下并不稳健地进行探索：i）在我们所有的实验中，只有一个配置产生了令人满意的探索行为：GPT-4采用思维链推理和外部总结的交互历史，呈现为充分统计量；ii）所有其他配置均未产生稳健的探索行为，包括具有思维链推理但未总结历史的配置。尽管这些发现可以积极解读，但它们表明外部总结——在更复杂的环境中可能无法实现——对于从LLM代理程序中获得理想行为是重要的。我们得出结论，可能需要进行非平凡的算法干预，如微调或数据集整理，才能赋予LLM为基础的决策制定代理程序在复杂环境中的能力。

English

We investigate the extent to which contemporary Large Language Models (LLMs) can engage in exploration, a core capability in reinforcement learning and decision making. We focus on native performance of existing LLMs, without training interventions. We deploy LLMs as agents in simple multi-armed bandit environments, specifying the environment description and interaction history entirely in-context, i.e., within the LLM prompt. We experiment with GPT-3.5, GPT-4, and Llama2, using a variety of prompt designs, and find that the models do not robustly engage in exploration without substantial interventions: i) Across all of our experiments, only one configuration resulted in satisfactory exploratory behavior: GPT-4 with chain-of-thought reasoning and an externally summarized interaction history, presented as sufficient statistics; ii) All other configurations did not result in robust exploratory behavior, including those with chain-of-thought reasoning but unsummarized history. Although these findings can be interpreted positively, they suggest that external summarization -- which may not be possible in more complex settings -- is important for obtaining desirable behavior from LLM agents. We conclude that non-trivial algorithmic interventions, such as fine-tuning or dataset curation, may be required to empower LLM-based decision making agents in complex settings.

大型语言模型能够进行上下文探索吗？

Can large language models explore in-context?

摘要

Support