AgentRxiv: 협업적 자율 연구를 향하여

초록

과학적 발견의 진전은 단일한 "유레카" 순간의 결과라기보다는, 수백 명의 과학자들이 공동의 목표를 향해 점진적으로 협력한 결과물인 경우가 많습니다. 기존의 에이전트 워크플로우는 자율적으로 연구를 수행할 수 있지만, 이는 고립된 상태에서 이루어지며, 이전 연구 결과를 지속적으로 개선할 수 있는 능력이 없습니다. 이러한 문제를 해결하기 위해, 우리는 LLM 에이전트 연구실들이 공유 프리프린트 서버에 보고서를 업로드하고 검색하여 협력하고, 통찰을 공유하며, 서로의 연구를 반복적으로 발전시킬 수 있는 AgentRxiv 프레임워크를 소개합니다. 우리는 에이전트 연구실들에게 새로운 추론 및 프롬프팅 기술을 개발하도록 요구했으며, 이전 연구에 접근할 수 있는 에이전트들이 고립된 상태에서 작동하는 에이전트들에 비해 더 높은 성능 향상을 달성함을 발견했습니다(MATH-500 기준 11.4% 상대적 개선). 또한, 가장 성능이 뛰어난 전략은 다른 도메인의 벤치마크에도 일반화되었습니다(평균 3.3% 개선). AgentRxiv를 통해 연구를 공유하는 여러 에이전트 연구실들은 공동의 목표를 향해 협력할 수 있으며, 고립된 연구실들보다 더 빠르게 진전하여 전반적으로 더 높은 정확도를 달성했습니다(MATH-500 기준 13.7% 상대적 개선). 이러한 발견들은 자율 에이전트들이 인간과 함께 미래의 AI 시스템을 설계하는 데 역할을 할 수 있음을 시사합니다. 우리는 AgentRxiv가 에이전트들이 연구 목표를 향해 협력할 수 있도록 하고, 연구자들이 발견을 가속화할 수 있기를 바랍니다.

English

Progress in scientific discovery is rarely the result of a single "Eureka" moment, but is rather the product of hundreds of scientists incrementally working together toward a common goal. While existing agent workflows are capable of producing research autonomously, they do so in isolation, without the ability to continuously improve upon prior research results. To address these challenges, we introduce AgentRxiv-a framework that lets LLM agent laboratories upload and retrieve reports from a shared preprint server in order to collaborate, share insights, and iteratively build on each other's research. We task agent laboratories to develop new reasoning and prompting techniques and find that agents with access to their prior research achieve higher performance improvements compared to agents operating in isolation (11.4% relative improvement over baseline on MATH-500). We find that the best performing strategy generalizes to benchmarks in other domains (improving on average by 3.3%). Multiple agent laboratories sharing research through AgentRxiv are able to work together towards a common goal, progressing more rapidly than isolated laboratories, achieving higher overall accuracy (13.7% relative improvement over baseline on MATH-500). These findings suggest that autonomous agents may play a role in designing future AI systems alongside humans. We hope that AgentRxiv allows agents to collaborate toward research goals and enables researchers to accelerate discovery.

AgentRxiv: 협업적 자율 연구를 향하여

AgentRxiv: Towards Collaborative Autonomous Research

초록

Support