Sibyl:適用於複雜現實世界推理的簡單而有效的代理框架
Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning
July 15, 2024
作者: Yulong Wang, Tianhao Shen, Lifeng Liu, Jian Xie
cs.AI
摘要
基於大型語言模型(LLMs)的現有代理人通過整合LLMs固有的知識、強大的上下文學習和零-shot能力,以及人類結合精心設計的LLM調用工作流程的工具,展示了強大的問題解決能力。然而,這些代理人在長期推理方面仍存在缺陷,並且未充分利用現有工具的潛力,導致在複雜的現實推理場景中存在明顯的缺陷。為了解決這些限制,我們引入了Sibyl,這是一個簡單而強大的基於LLM的代理人框架,旨在通過有效地利用一組最小的工具來應對複雜的推理任務。受全球工作空間理論的啟發,Sibyl將全局工作空間納入其中,以增強整個系統中知識和對話歷史的管理和共享。此外,受心靈社會理論的指導,Sibyl實現了一個基於辯論的多代理人陪審團,以自我完善最終答案,確保全面和平衡的方法。這種方法旨在降低系統的複雜性,同時擴大可解決問題的範圍-從通常由人類在幾分鐘內解決的問題到需要幾小時甚至幾天才能解決的問題,從而促進從系統1到系統2思維的轉變。Sibyl的設計著重於可擴展性和易於調試,從最初即納入了來自函數編程的可重入性概念,旨在無縫且輕鬆地集成到其他LLM應用程序中,以提高能力。我們在GAIA基準測試集上的實驗結果顯示,使用GPT-4實例化的Sibyl代理人實現了平均得分34.55%的最新性能,相較於其他基於GPT-4的代理人。我們希望Sibyl能激發出更可靠和可重用的基於LLM的代理人解決方案,以應對複雜的現實推理任務。
English
Existing agents based on large language models (LLMs) demonstrate robust
problem-solving capabilities by integrating LLMs' inherent knowledge, strong
in-context learning and zero-shot capabilities, and the use of tools combined
with intricately designed LLM invocation workflows by humans. However, these
agents still exhibit shortcomings in long-term reasoning and under-use the
potential of existing tools, leading to noticeable deficiencies in complex
real-world reasoning scenarios. To address these limitations, we introduce
Sibyl, a simple yet powerful LLM-based agent framework designed to tackle
complex reasoning tasks by efficiently leveraging a minimal set of tools.
Drawing inspiration from Global Workspace Theory, Sibyl incorporates a global
workspace to enhance the management and sharing of knowledge and conversation
history throughout the system. Furthermore, guided by Society of Mind Theory,
Sibyl implements a multi-agent debate-based jury to self-refine the final
answers, ensuring a comprehensive and balanced approach. This approach aims to
reduce system complexity while expanding the scope of problems solvable-from
matters typically resolved by humans in minutes to those requiring hours or
even days, thus facilitating a shift from System-1 to System-2 thinking. Sibyl
has been designed with a focus on scalability and ease of debugging by
incorporating the concept of reentrancy from functional programming from its
inception, with the aim of seamless and low effort integration in other LLM
applications to improve capabilities. Our experimental results on the GAIA
benchmark test set reveal that the Sibyl agent instantiated with GPT-4 achieves
state-of-the-art performance with an average score of 34.55%, compared to other
agents based on GPT-4. We hope that Sibyl can inspire more reliable and
reusable LLM-based agent solutions to address complex real-world reasoning
tasks.Summary
AI-Generated Summary