AgentRxiv:邁向協作式自主研究
AgentRxiv: Towards Collaborative Autonomous Research
March 23, 2025
作者: Samuel Schmidgall, Michael Moor
cs.AI
摘要
科學發現的進程鮮少源於單一的「靈光乍現」時刻,而是數百位科學家朝著共同目標逐步協作的成果。現有的智能體工作流程雖能自主產生研究,卻是在孤立狀態下進行,無法持續改進先前的研究成果。為應對這些挑戰,我們引入了AgentRxiv——一個讓LLM智能體實驗室能夠上傳並從共享預印本伺服器檢索報告的框架,以便協作、分享洞見,並在彼此研究的基礎上迭代建構。我們委託智能體實驗室開發新的推理與提示技術,發現能夠存取自身先前研究的智能體,相較於孤立運作的智能體,取得了更高的性能提升(在MATH-500基準上相對基準線提升了11.4%)。我們發現,表現最佳的策略能泛化至其他領域的基準測試(平均提升3.3%)。多個智能體實驗室通過AgentRxiv共享研究,能夠協同邁向共同目標,比孤立實驗室進展更快,整體準確率更高(在MATH-500基準上相對基準線提升了13.7%)。這些發現表明,自主智能體未來可能與人類並肩設計AI系統。我們希望AgentRxiv能讓智能體協作達成研究目標,並助力研究人員加速發現進程。
English
Progress in scientific discovery is rarely the result of a single "Eureka"
moment, but is rather the product of hundreds of scientists incrementally
working together toward a common goal. While existing agent workflows are
capable of producing research autonomously, they do so in isolation, without
the ability to continuously improve upon prior research results. To address
these challenges, we introduce AgentRxiv-a framework that lets LLM agent
laboratories upload and retrieve reports from a shared preprint server in order
to collaborate, share insights, and iteratively build on each other's research.
We task agent laboratories to develop new reasoning and prompting techniques
and find that agents with access to their prior research achieve higher
performance improvements compared to agents operating in isolation (11.4%
relative improvement over baseline on MATH-500). We find that the best
performing strategy generalizes to benchmarks in other domains (improving on
average by 3.3%). Multiple agent laboratories sharing research through
AgentRxiv are able to work together towards a common goal, progressing more
rapidly than isolated laboratories, achieving higher overall accuracy (13.7%
relative improvement over baseline on MATH-500). These findings suggest that
autonomous agents may play a role in designing future AI systems alongside
humans. We hope that AgentRxiv allows agents to collaborate toward research
goals and enables researchers to accelerate discovery.Summary
AI-Generated Summary