大規模言語モデルは文脈内で探索可能か？

要旨

現代の大規模言語モデル（LLM）が、強化学習や意思決定における中核的な能力である探索にどの程度関与できるかを調査します。本研究では、既存のLLMのネイティブな性能に焦点を当て、トレーニング介入を行わずに検証します。LLMをエージェントとして単純な多腕バンディット環境に配置し、環境の説明とインタラクション履歴を完全にコンテキスト内（つまり、LLMのプロンプト内）で指定します。GPT-3.5、GPT-4、およびLlama2を使用し、さまざまなプロンプト設計を実験した結果、モデルは大幅な介入なしには堅牢な探索を行わないことがわかりました。i) すべての実験の中で、満足のいく探索行動が得られたのは1つの設定のみでした：GPT-4に連鎖思考（chain-of-thought）推論と外部で要約されたインタラクション履歴（十分統計量として提示）を組み合わせた場合です。ii) 他のすべての設定、特に連鎖思考推論を使用したが履歴が要約されていない場合では、堅牢な探索行動は得られませんでした。これらの結果は肯定的に解釈できるものの、より複雑な設定では不可能かもしれない外部要約が、LLMエージェントから望ましい行動を引き出すために重要であることを示唆しています。結論として、複雑な設定においてLLMベースの意思決定エージェントを強化するためには、ファインチューニングやデータセットのキュレーションといった非自明なアルゴリズム的介入が必要となる可能性があります。

English

We investigate the extent to which contemporary Large Language Models (LLMs) can engage in exploration, a core capability in reinforcement learning and decision making. We focus on native performance of existing LLMs, without training interventions. We deploy LLMs as agents in simple multi-armed bandit environments, specifying the environment description and interaction history entirely in-context, i.e., within the LLM prompt. We experiment with GPT-3.5, GPT-4, and Llama2, using a variety of prompt designs, and find that the models do not robustly engage in exploration without substantial interventions: i) Across all of our experiments, only one configuration resulted in satisfactory exploratory behavior: GPT-4 with chain-of-thought reasoning and an externally summarized interaction history, presented as sufficient statistics; ii) All other configurations did not result in robust exploratory behavior, including those with chain-of-thought reasoning but unsummarized history. Although these findings can be interpreted positively, they suggest that external summarization -- which may not be possible in more complex settings -- is important for obtaining desirable behavior from LLM agents. We conclude that non-trivial algorithmic interventions, such as fine-tuning or dataset curation, may be required to empower LLM-based decision making agents in complex settings.

大規模言語モデルは文脈内で探索可能か？

Can large language models explore in-context?

要旨

Support