Statler: 具象的推論のための状態維持型言語モデル

要旨

大規模言語モデル（LLMs）は、ロボットが複雑な推論タスクを実行するための有望なツールを提供します。しかし、現代のLLMsの限られたコンテキストウィンドウにより、長期的な時間軸にわたる推論が困難となっています。家庭用ロボットが実行することが期待されるような具体的なタスクでは、プランナーが過去に取得した情報（例えば、ロボットが以前に環境で遭遇した多くのオブジェクトの特性）を考慮する必要があります。LLMの暗黙的な内部表現を使用して世界の状態を捕捉しようとする試みは、ロボットのアクション履歴に含まれるタスクおよび環境に関連する情報の不足により複雑化します。一方、プロンプトを介してLLMに情報を伝達する能力に依存する方法は、その限られたコンテキストウィンドウの制約を受けます。本論文では、Statlerというフレームワークを提案します。Statlerは、LLMに「記憶」としての明示的な世界状態表現を付与し、これを時間をかけて維持します。Statlerの中核は、世界モデルリーダーと世界モデルライターという2つの一般的なLLMインスタンスを使用し、これらが世界状態とインターフェースし、維持することです。この世界状態「記憶」へのアクセスを提供することで、Statlerは既存のLLMsがコンテキスト長の制約なしに長期的な時間軸にわたって推論する能力を向上させます。我々は、3つのシミュレーションされたテーブルトップ操作ドメインと実ロボットドメインにおいて、このアプローチの有効性を評価し、LLMベースのロボット推論において最先端の性能を向上させることを示します。プロジェクトウェブサイト: https://statler-lm.github.io/

English

Large language models (LLMs) provide a promising tool that enable robots to perform complex robot reasoning tasks. However, the limited context window of contemporary LLMs makes reasoning over long time horizons difficult. Embodied tasks such as those that one might expect a household robot to perform typically require that the planner consider information acquired a long time ago (e.g., properties of the many objects that the robot previously encountered in the environment). Attempts to capture the world state using an LLM's implicit internal representation is complicated by the paucity of task- and environment-relevant information available in a robot's action history, while methods that rely on the ability to convey information via the prompt to the LLM are subject to its limited context window. In this paper, we propose Statler, a framework that endows LLMs with an explicit representation of the world state as a form of ``memory'' that is maintained over time. Integral to Statler is its use of two instances of general LLMs -- a world-model reader and a world-model writer -- that interface with and maintain the world state. By providing access to this world state ``memory'', Statler improves the ability of existing LLMs to reason over longer time horizons without the constraint of context length. We evaluate the effectiveness of our approach on three simulated table-top manipulation domains and a real robot domain, and show that it improves the state-of-the-art in LLM-based robot reasoning. Project website: https://statler-lm.github.io/

Statler: 具象的推論のための状態維持型言語モデル

Statler: State-Maintaining Language Models for Embodied Reasoning

要旨

Support