SwiftSage: 複雑なインタラクティブタスクのための高速思考と低速思考を備えた生成エージェント

要旨

私たちは、人間の認知における二重過程理論に着想を得た新しいエージェントフレームワーク「SwiftSage」を紹介します。このフレームワークは、複雑なインタラクティブ推論タスクにおけるアクションプランニングに優れるように設計されています。SwiftSageは、行動クローニングと大規模言語モデル（LLM）のプロンプティングの強みを統合し、タスク完了性能を向上させます。このフレームワークは、2つの主要なモジュールで構成されています。1つは、迅速で直感的な思考を表す「Swiftモジュール」、もう1つは、熟慮的な思考プロセスを模倣する「Sageモジュール」です。Swiftモジュールは、オラクルエージェントのアクショントラジェクトリに基づいてファインチューニングされた小型のエンコーダー・デコーダーLMであり、Sageモジュールは、GPT-4などのLLMをサブゴールプランニングとグラウンディングに使用します。私たちは、2つのモジュールを調和的に統合するためのヒューリスティックな方法を開発し、より効率的で堅牢な問題解決プロセスを実現しました。ScienceWorldベンチマークの30のタスクにおいて、SwiftSageはSayCan、ReAct、Reflexionなどの他の手法を大幅に上回り、複雑な現実世界のタスクを解決する際の有効性を実証しました。

English

We introduce SwiftSage, a novel agent framework inspired by the dual-process theory of human cognition, designed to excel in action planning for complex interactive reasoning tasks. SwiftSage integrates the strengths of behavior cloning and prompting large language models (LLMs) to enhance task completion performance. The framework comprises two primary modules: the Swift module, representing fast and intuitive thinking, and the Sage module, emulating deliberate thought processes. The Swift module is a small encoder-decoder LM fine-tuned on the oracle agent's action trajectories, while the Sage module employs LLMs such as GPT-4 for subgoal planning and grounding. We develop a heuristic method to harmoniously integrate the two modules, resulting in a more efficient and robust problem-solving process. In 30 tasks from the ScienceWorld benchmark, SwiftSage significantly outperforms other methods such as SayCan, ReAct, and Reflexion, demonstrating its effectiveness in solving complex real-world tasks.

SwiftSage: 複雑なインタラクティブタスクのための高速思考と低速思考を備えた生成エージェント

SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks

要旨

Support