Sibyl：用于复杂现实世界推理的简单而有效的Agent框架

摘要

基于大型语言模型（LLMs）的现有代理通过整合LLMs固有的知识、强大的上下文学习和零样本能力，以及人类结合精心设计的LLM调用工作流的工具使用，展示了强大的问题解决能力。然而，这些代理在长期推理方面仍存在缺陷，并且未充分利用现有工具的潜力，导致在复杂的现实推理场景中存在明显的不足。为了解决这些限制，我们引入了Sibyl，这是一个简单而强大的基于LLM的代理框架，旨在通过高效利用一组最小化的工具来解决复杂的推理任务。受全局工作空间理论启发，Sibyl引入了一个全局工作空间，以增强系统中知识和对话历史的管理和共享。此外，受心智社会理论的指导，Sibyl实现了一个基于多代理辩论的陪审团，以自我完善最终答案，确保全面和平衡的方法。这种方法旨在减少系统复杂性，同时扩大可解决问题的范围-从通常由人类在几分钟内解决的问题到需要几个小时甚至几天解决的问题，从而促进从系统1到系统2思维的转变。Sibyl的设计侧重于可扩展性和易于调试，从其最初就引入了函数式编程中的可重入性的概念，旨在实现与其他LLM应用的无缝且低成本集成，以提高能力。我们在GAIA基准测试集上的实验结果显示，使用GPT-4实例化的Sibyl代理实现了平均得分为34.55%的最新性能，相较于其他基于GPT-4的代理。我们希望Sibyl能激发更可靠和可重复使用的基于LLM的代理解决方案，以解决复杂的现实推理任务。

English

Existing agents based on large language models (LLMs) demonstrate robust problem-solving capabilities by integrating LLMs' inherent knowledge, strong in-context learning and zero-shot capabilities, and the use of tools combined with intricately designed LLM invocation workflows by humans. However, these agents still exhibit shortcomings in long-term reasoning and under-use the potential of existing tools, leading to noticeable deficiencies in complex real-world reasoning scenarios. To address these limitations, we introduce Sibyl, a simple yet powerful LLM-based agent framework designed to tackle complex reasoning tasks by efficiently leveraging a minimal set of tools. Drawing inspiration from Global Workspace Theory, Sibyl incorporates a global workspace to enhance the management and sharing of knowledge and conversation history throughout the system. Furthermore, guided by Society of Mind Theory, Sibyl implements a multi-agent debate-based jury to self-refine the final answers, ensuring a comprehensive and balanced approach. This approach aims to reduce system complexity while expanding the scope of problems solvable-from matters typically resolved by humans in minutes to those requiring hours or even days, thus facilitating a shift from System-1 to System-2 thinking. Sibyl has been designed with a focus on scalability and ease of debugging by incorporating the concept of reentrancy from functional programming from its inception, with the aim of seamless and low effort integration in other LLM applications to improve capabilities. Our experimental results on the GAIA benchmark test set reveal that the Sibyl agent instantiated with GPT-4 achieves state-of-the-art performance with an average score of 34.55%, compared to other agents based on GPT-4. We hope that Sibyl can inspire more reliable and reusable LLM-based agent solutions to address complex real-world reasoning tasks.

Sibyl：用于复杂现实世界推理的简单而有效的Agent框架

Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning

摘要

Support