Sibyl：適用於複雜現實世界推理的簡單而有效的代理框架

摘要

基於大型語言模型（LLMs）的現有代理人通過整合LLMs固有的知識、強大的上下文學習和零-shot能力，以及人類結合精心設計的LLM調用工作流程的工具，展示了強大的問題解決能力。然而，這些代理人在長期推理方面仍存在缺陷，並且未充分利用現有工具的潛力，導致在複雜的現實推理場景中存在明顯的缺陷。為了解決這些限制，我們引入了Sibyl，這是一個簡單而強大的基於LLM的代理人框架，旨在通過有效地利用一組最小的工具來應對複雜的推理任務。受全球工作空間理論的啟發，Sibyl將全局工作空間納入其中，以增強整個系統中知識和對話歷史的管理和共享。此外，受心靈社會理論的指導，Sibyl實現了一個基於辯論的多代理人陪審團，以自我完善最終答案，確保全面和平衡的方法。這種方法旨在降低系統的複雜性，同時擴大可解決問題的範圍-從通常由人類在幾分鐘內解決的問題到需要幾小時甚至幾天才能解決的問題，從而促進從系統1到系統2思維的轉變。Sibyl的設計著重於可擴展性和易於調試，從最初即納入了來自函數編程的可重入性概念，旨在無縫且輕鬆地集成到其他LLM應用程序中，以提高能力。我們在GAIA基準測試集上的實驗結果顯示，使用GPT-4實例化的Sibyl代理人實現了平均得分34.55％的最新性能，相較於其他基於GPT-4的代理人。我們希望Sibyl能激發出更可靠和可重用的基於LLM的代理人解決方案，以應對複雜的現實推理任務。

English

Existing agents based on large language models (LLMs) demonstrate robust problem-solving capabilities by integrating LLMs' inherent knowledge, strong in-context learning and zero-shot capabilities, and the use of tools combined with intricately designed LLM invocation workflows by humans. However, these agents still exhibit shortcomings in long-term reasoning and under-use the potential of existing tools, leading to noticeable deficiencies in complex real-world reasoning scenarios. To address these limitations, we introduce Sibyl, a simple yet powerful LLM-based agent framework designed to tackle complex reasoning tasks by efficiently leveraging a minimal set of tools. Drawing inspiration from Global Workspace Theory, Sibyl incorporates a global workspace to enhance the management and sharing of knowledge and conversation history throughout the system. Furthermore, guided by Society of Mind Theory, Sibyl implements a multi-agent debate-based jury to self-refine the final answers, ensuring a comprehensive and balanced approach. This approach aims to reduce system complexity while expanding the scope of problems solvable-from matters typically resolved by humans in minutes to those requiring hours or even days, thus facilitating a shift from System-1 to System-2 thinking. Sibyl has been designed with a focus on scalability and ease of debugging by incorporating the concept of reentrancy from functional programming from its inception, with the aim of seamless and low effort integration in other LLM applications to improve capabilities. Our experimental results on the GAIA benchmark test set reveal that the Sibyl agent instantiated with GPT-4 achieves state-of-the-art performance with an average score of 34.55%, compared to other agents based on GPT-4. We hope that Sibyl can inspire more reliable and reusable LLM-based agent solutions to address complex real-world reasoning tasks.

Sibyl：適用於複雜現實世界推理的簡單而有效的代理框架

Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning

摘要

Support