OAgents:构建高效智能体的实证研究
OAgents: An Empirical Study of Building Effective Agents
June 17, 2025
作者: He Zhu, Tianrui Qin, King Zhu, Heyuan Huang, Yeyi Guan, Jinxiang Xia, Yi Yao, Hanhao Li, Ningning Wang, Pai Liu, Tianhao Peng, Xin Gui, Xiaowan Li, Yuhui Liu, Yuchen Eleanor Jiang, Jun Wang, Changwang Zhang, Xiangru Tang, Ge Zhang, Jian Yang, Minghao Liu, Xitong Gao, Jiaheng Liu, Wangchunshu Zhou
cs.AI
摘要
近年来,自主智能体(Agentic AI)已成为日益热门的研究领域。然而,我们认为当前智能体研究实践缺乏标准化和科学严谨性,导致难以对各类方法进行公平比较。因此,不同智能体框架中的设计选择如何影响其有效性仍不明确,衡量其进展也颇具挑战。在本研究中,我们基于GAIA基准和BrowseComp开展了一项系统的实证研究,以公平且严谨的方式探讨关键智能体组件中流行设计选择的影响。我们发现,由于缺乏标准评估协议,即便是开源的前期工作也难以复现,且随机运行间存在显著差异。为此,我们引入了一种更为稳健的评估协议,以稳定比较结果。我们的研究揭示了哪些组件和设计对高效智能体至关重要,而哪些尽管看似合理却属冗余。基于这些发现,我们构建并开源了OAgents,这一新型基础智能体框架在开源项目中实现了顶尖性能。OAgents为各类智能体组件提供了模块化设计,旨在推动自主智能体领域的未来研究。
English
Recently, Agentic AI has become an increasingly popular research field.
However, we argue that current agent research practices lack standardization
and scientific rigor, making it hard to conduct fair comparisons among methods.
As a result, it is still unclear how different design choices in agent
frameworks affect effectiveness, and measuring their progress remains
challenging. In this work, we conduct a systematic empirical study on GAIA
benchmark and BrowseComp to examine the impact of popular design choices in key
agent components in a fair and rigorous manner. We find that the lack of a
standard evaluation protocol makes previous works, even open-sourced ones,
non-reproducible, with significant variance between random runs. Therefore, we
introduce a more robust evaluation protocol to stabilize comparisons. Our study
reveals which components and designs are crucial for effective agents, while
others are redundant, despite seeming logical. Based on our findings, we build
and open-source OAgents, a new foundation agent framework that achieves
state-of-the-art performance among open-source projects. OAgents offers a
modular design for various agent components, promoting future research in
Agentic AI.