ChatPaper.aiChatPaper

OAgents:構建高效能代理的實證研究

OAgents: An Empirical Study of Building Effective Agents

June 17, 2025
作者: He Zhu, Tianrui Qin, King Zhu, Heyuan Huang, Yeyi Guan, Jinxiang Xia, Yi Yao, Hanhao Li, Ningning Wang, Pai Liu, Tianhao Peng, Xin Gui, Xiaowan Li, Yuhui Liu, Yuchen Eleanor Jiang, Jun Wang, Changwang Zhang, Xiangru Tang, Ge Zhang, Jian Yang, Minghao Liu, Xitong Gao, Jiaheng Liu, Wangchunshu Zhou
cs.AI

摘要

近年來,自主式人工智慧(Agentic AI)已成為日益熱門的研究領域。然而,我們認為當前的代理研究實踐缺乏標準化和科學嚴謹性,使得不同方法之間難以進行公平比較。因此,目前尚不清楚代理框架中的不同設計選擇如何影響效能,且衡量其進展仍具挑戰性。在本研究中,我們對GAIA基準和BrowseComp進行了系統性的實證研究,以公平且嚴謹的方式檢驗關鍵代理組件中常見設計選擇的影響。我們發現,由於缺乏標準的評估協議,先前的研究(即使是開源項目)也難以重現,且隨機運行之間存在顯著差異。因此,我們引入了一種更穩健的評估協議,以穩定比較結果。我們的研究揭示了哪些組件和設計對高效代理至關重要,而其他看似合理的設計則顯得冗餘。基於這些發現,我們構建並開源了OAgents,這是一個新的基礎代理框架,在開源項目中實現了最先進的性能。OAgents提供了各種代理組件的模組化設計,推動了自主式人工智慧的未來研究。
English
Recently, Agentic AI has become an increasingly popular research field. However, we argue that current agent research practices lack standardization and scientific rigor, making it hard to conduct fair comparisons among methods. As a result, it is still unclear how different design choices in agent frameworks affect effectiveness, and measuring their progress remains challenging. In this work, we conduct a systematic empirical study on GAIA benchmark and BrowseComp to examine the impact of popular design choices in key agent components in a fair and rigorous manner. We find that the lack of a standard evaluation protocol makes previous works, even open-sourced ones, non-reproducible, with significant variance between random runs. Therefore, we introduce a more robust evaluation protocol to stabilize comparisons. Our study reveals which components and designs are crucial for effective agents, while others are redundant, despite seeming logical. Based on our findings, we build and open-source OAgents, a new foundation agent framework that achieves state-of-the-art performance among open-source projects. OAgents offers a modular design for various agent components, promoting future research in Agentic AI.
PDF241June 24, 2025