SwiftSage: 복잡한 상호작용 작업을 위한 빠른 사고와 느린 사고를 갖춘 생성형 에이전트

초록

본 논문에서는 인간 인지의 이중 과정 이론에서 영감을 받아 복잡한 상호작용적 추론 과제에서의 행동 계획에 탁월한 성능을 발휘하도록 설계된 새로운 에이전트 프레임워크인 SwiftSage를 소개한다. SwiftSage는 행동 복제와 대형 언어 모델(LLMs) 프롬프팅의 강점을 통합하여 과제 완수 성능을 향상시킨다. 이 프레임워크는 빠르고 직관적인 사고를 나타내는 Swift 모듈과 숙고적 사고 과정을 모방하는 Sage 모듈이라는 두 가지 주요 모듈로 구성된다. Swift 모듈은 오라클 에이전트의 행동 궤적에 대해 미세 조정된 소형 인코더-디코더 언어 모델이며, Sage 모듈은 GPT-4와 같은 LLMs를 활용하여 하위 목표 계획 및 근거 설정을 수행한다. 두 모듈을 조화롭게 통합하기 위한 휴리스틱 방법을 개발함으로써 더 효율적이고 강력한 문제 해결 과정을 구현하였다. ScienceWorld 벤치마크의 30개 과제에서 SwiftSage는 SayCan, ReAct, Reflexion과 같은 다른 방법들을 크게 능가하며, 복잡한 실세계 과제 해결에서의 효과성을 입증하였다.

English

We introduce SwiftSage, a novel agent framework inspired by the dual-process theory of human cognition, designed to excel in action planning for complex interactive reasoning tasks. SwiftSage integrates the strengths of behavior cloning and prompting large language models (LLMs) to enhance task completion performance. The framework comprises two primary modules: the Swift module, representing fast and intuitive thinking, and the Sage module, emulating deliberate thought processes. The Swift module is a small encoder-decoder LM fine-tuned on the oracle agent's action trajectories, while the Sage module employs LLMs such as GPT-4 for subgoal planning and grounding. We develop a heuristic method to harmoniously integrate the two modules, resulting in a more efficient and robust problem-solving process. In 30 tasks from the ScienceWorld benchmark, SwiftSage significantly outperforms other methods such as SayCan, ReAct, and Reflexion, demonstrating its effectiveness in solving complex real-world tasks.

SwiftSage: 복잡한 상호작용 작업을 위한 빠른 사고와 느린 사고를 갖춘 생성형 에이전트

SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks

초록

Support