ChatPaper.aiChatPaper

成为优秀AI研究智能体需要什么?探讨构思多样性的作用

What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity

November 19, 2025
作者: Alexis Audran-Reiss, Jordi Armengol Estapé, Karen Hambardzumyan, Amar Budhiraja, Martin Josifoski, Edan Toledo, Rishi Hazra, Despoina Magka, Michael Shvartsman, Parth Pathak, Justine T Kao, Lucia Cipolina-Kun, Bhavul Gauri, Jean-Christophe Gagnon-Audet, Emanuel Tewolde, Jenny Zhang, Taco Cohen, Yossi Adi, Tatiana Shavrina, Yoram Bachrach
cs.AI

摘要

人工智能研究智能体通过自动化机器学习模型的设计、实现与训练流程,为加速科研进程提供了可能。然而该领域仍处于起步阶段,驱动智能体轨迹成败的关键因素尚未被完全认知。本研究重点探讨构思多样性对智能体表现的影响机制。首先,我们在MLE-bench(评估AI研究智能体的知名基准)上分析了不同模型与智能体框架的运行轨迹,发现不同配置会催生差异化的构思多样性水平,且表现更优的智能体往往展现出更高的构思多样性。进而通过控制实验调节构思多样性程度,证实提升多样性可显著增强智能体性能。最后,我们突破MLE-bench传统的奖牌评分体系,引入更多评估指标进行验证,结果表明研究结论在不同性能度量标准下依然成立。
English
AI research agents offer the promise to accelerate scientific progress by automating the design, implementation, and training of machine learning models. However, the field is still in its infancy, and the key factors driving the success or failure of agent trajectories are not fully understood. We examine the role that ideation diversity plays in agent performance. First, we analyse agent trajectories on MLE-bench, a well-known benchmark to evaluate AI research agents, across different models and agent scaffolds. Our analysis reveals that different models and agent scaffolds yield varying degrees of ideation diversity, and that higher-performing agents tend to have increased ideation diversity. Further, we run a controlled experiment where we modify the degree of ideation diversity, demonstrating that higher ideation diversity results in stronger performance. Finally, we strengthen our results by examining additional evaluation metrics beyond the standard medal-based scoring of MLE-bench, showing that our findings still hold across other agent performance metrics.
PDF543December 2, 2025