AgentRxiv:迈向协作式自主研究
AgentRxiv: Towards Collaborative Autonomous Research
March 23, 2025
作者: Samuel Schmidgall, Michael Moor
cs.AI
摘要
科学发现的进步鲜少源于某个单一的“灵光乍现”时刻,而是成百上千位科学家朝着共同目标逐步协作的成果。尽管现有的智能体工作流能够自主开展研究,但它们往往孤立运作,无法持续改进先前的研究成果。为应对这些挑战,我们推出了AgentRxiv——一个框架,它让大型语言模型(LLM)智能体实验室能够在一个共享的预印本服务器上上传和检索报告,以此协作、分享洞见,并迭代地基于彼此的研究成果进行构建。我们委派智能体实验室开发新的推理与提示技术,发现那些能够访问自身先前研究成果的智能体,相较于孤立运作的智能体,实现了更高的性能提升(在MATH-500基准上相对基线提升了11.4%)。我们还发现,表现最佳的策略能够泛化至其他领域的基准测试(平均提升3.3%)。通过AgentRxiv共享研究成果的多个智能体实验室,能够协同向共同目标迈进,其进展速度远超孤立实验室,整体准确率也更高(在MATH-500基准上相对基线提升了13.7%)。这些发现表明,自主智能体未来或能与人类并肩,共同设计AI系统。我们期待AgentRxiv能促进智能体间的研究目标协作,助力研究人员加速科学发现进程。
English
Progress in scientific discovery is rarely the result of a single "Eureka"
moment, but is rather the product of hundreds of scientists incrementally
working together toward a common goal. While existing agent workflows are
capable of producing research autonomously, they do so in isolation, without
the ability to continuously improve upon prior research results. To address
these challenges, we introduce AgentRxiv-a framework that lets LLM agent
laboratories upload and retrieve reports from a shared preprint server in order
to collaborate, share insights, and iteratively build on each other's research.
We task agent laboratories to develop new reasoning and prompting techniques
and find that agents with access to their prior research achieve higher
performance improvements compared to agents operating in isolation (11.4%
relative improvement over baseline on MATH-500). We find that the best
performing strategy generalizes to benchmarks in other domains (improving on
average by 3.3%). Multiple agent laboratories sharing research through
AgentRxiv are able to work together towards a common goal, progressing more
rapidly than isolated laboratories, achieving higher overall accuracy (13.7%
relative improvement over baseline on MATH-500). These findings suggest that
autonomous agents may play a role in designing future AI systems alongside
humans. We hope that AgentRxiv allows agents to collaborate toward research
goals and enables researchers to accelerate discovery.Summary
AI-Generated Summary