Sibyl: 복잡한 현실 세계 추론을 위한 간단하지만 효과적인 에이전트 프레임워크

초록

대형 언어 모델(LLM) 기반의 기존 에이전트들은 LLM의 내재적 지식, 강력한 문맥 내 학습 및 제로샷 능력, 그리고 인간이 설계한 정교한 LLM 호출 워크플로우와 도구 사용을 통합함으로써 견고한 문제 해결 능력을 보여줍니다. 그러나 이러한 에이전트들은 여전히 장기적 추론에서 부족함을 보이며, 기존 도구의 잠재력을 충분히 활용하지 못해 복잡한 현실 세계 추론 시나리오에서 눈에 띄는 결함을 드러냅니다. 이러한 한계를 해결하기 위해, 우리는 최소한의 도구 세트를 효율적으로 활용하여 복잡한 추론 과제를 해결하도록 설계된 간단하지만 강력한 LLM 기반 에이전트 프레임워크인 Sibyl을 소개합니다. Sibyl은 글로벌 작업 공간 이론(Global Workspace Theory)에서 영감을 받아, 시스템 전반에 걸쳐 지식과 대화 기록의 관리 및 공유를 강화하기 위해 글로벌 작업 공간을 도입했습니다. 또한, 마음의 사회 이론(Society of Mind Theory)에 따라, Sibyl은 다중 에이전트 토론 기반의 배심원을 구현하여 최종 답변을 자체적으로 개선함으로써 포괄적이고 균형 잡힌 접근 방식을 보장합니다. 이 접근법은 시스템 복잡성을 줄이면서도 해결 가능한 문제의 범위를 확장하는 것을 목표로 합니다. 즉, 인간이 몇 분 안에 해결하는 문제에서 몇 시간 또는 며칠이 걸리는 문제까지 다룰 수 있도록 하여, 시스템-1 사고에서 시스템-2 사고로의 전환을 촉진합니다. Sibyl은 확장성과 디버깅 용이성에 초점을 맞춰 설계되었으며, 함수형 프로그래밍에서의 재진입(reentrancy) 개념을 처음부터 도입함으로써 다른 LLM 애플리케이션에 원활하고 저비용으로 통합되어 능력을 향상시킬 수 있도록 했습니다. GAIA 벤치마크 테스트 세트에서의 실험 결과, GPT-4로 인스턴스화된 Sibyl 에이전트는 GPT-4 기반의 다른 에이전트들과 비교하여 평균 34.55%의 점수로 최첨단 성능을 달성했습니다. 우리는 Sibyl이 복잡한 현실 세계 추론 과제를 해결하기 위해 더 신뢰할 수 있고 재사용 가능한 LLM 기반 에이전트 솔루션을 고무할 수 있기를 바랍니다.

English

Existing agents based on large language models (LLMs) demonstrate robust problem-solving capabilities by integrating LLMs' inherent knowledge, strong in-context learning and zero-shot capabilities, and the use of tools combined with intricately designed LLM invocation workflows by humans. However, these agents still exhibit shortcomings in long-term reasoning and under-use the potential of existing tools, leading to noticeable deficiencies in complex real-world reasoning scenarios. To address these limitations, we introduce Sibyl, a simple yet powerful LLM-based agent framework designed to tackle complex reasoning tasks by efficiently leveraging a minimal set of tools. Drawing inspiration from Global Workspace Theory, Sibyl incorporates a global workspace to enhance the management and sharing of knowledge and conversation history throughout the system. Furthermore, guided by Society of Mind Theory, Sibyl implements a multi-agent debate-based jury to self-refine the final answers, ensuring a comprehensive and balanced approach. This approach aims to reduce system complexity while expanding the scope of problems solvable-from matters typically resolved by humans in minutes to those requiring hours or even days, thus facilitating a shift from System-1 to System-2 thinking. Sibyl has been designed with a focus on scalability and ease of debugging by incorporating the concept of reentrancy from functional programming from its inception, with the aim of seamless and low effort integration in other LLM applications to improve capabilities. Our experimental results on the GAIA benchmark test set reveal that the Sibyl agent instantiated with GPT-4 achieves state-of-the-art performance with an average score of 34.55%, compared to other agents based on GPT-4. We hope that Sibyl can inspire more reliable and reusable LLM-based agent solutions to address complex real-world reasoning tasks.

Sibyl: 복잡한 현실 세계 추론을 위한 간단하지만 효과적인 에이전트 프레임워크

Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning

초록

Support