ChatPaper.aiChatPaper

大型语言模型能够进行上下文探索吗?

Can large language models explore in-context?

March 22, 2024
作者: Akshay Krishnamurthy, Keegan Harris, Dylan J. Foster, Cyril Zhang, Aleksandrs Slivkins
cs.AI

摘要

我们调查了当代大型语言模型(LLMs)在探索方面的能力,这是强化学习和决策制定中的核心能力。我们专注于现有LLMs的本机性能,而不进行训练干预。我们将LLMs部署为简单的多臂老虎机环境中的代理程序,完全在上下文中指定环境描述和交互历史,即在LLM提示中。我们尝试了GPT-3.5、GPT-4和Llama2,使用各种提示设计,发现这些模型在没有实质干预的情况下并不稳健地进行探索:i)在我们所有的实验中,只有一个配置产生了令人满意的探索行为:GPT-4采用思维链推理和外部总结的交互历史,呈现为充分统计量;ii)所有其他配置均未产生稳健的探索行为,包括具有思维链推理但未总结历史的配置。尽管这些发现可以积极解读,但它们表明外部总结——在更复杂的环境中可能无法实现——对于从LLM代理程序中获得理想行为是重要的。我们得出结论,可能需要进行非平凡的算法干预,如微调或数据集整理,才能赋予LLM为基础的决策制定代理程序在复杂环境中的能力。
English
We investigate the extent to which contemporary Large Language Models (LLMs) can engage in exploration, a core capability in reinforcement learning and decision making. We focus on native performance of existing LLMs, without training interventions. We deploy LLMs as agents in simple multi-armed bandit environments, specifying the environment description and interaction history entirely in-context, i.e., within the LLM prompt. We experiment with GPT-3.5, GPT-4, and Llama2, using a variety of prompt designs, and find that the models do not robustly engage in exploration without substantial interventions: i) Across all of our experiments, only one configuration resulted in satisfactory exploratory behavior: GPT-4 with chain-of-thought reasoning and an externally summarized interaction history, presented as sufficient statistics; ii) All other configurations did not result in robust exploratory behavior, including those with chain-of-thought reasoning but unsummarized history. Although these findings can be interpreted positively, they suggest that external summarization -- which may not be possible in more complex settings -- is important for obtaining desirable behavior from LLM agents. We conclude that non-trivial algorithmic interventions, such as fine-tuning or dataset curation, may be required to empower LLM-based decision making agents in complex settings.

Summary

AI-Generated Summary

PDF342December 15, 2024