Xolver: 올림피아드 팀처럼 전체적 경험 학습을 통한 다중 에이전트 추론

초록

복잡한 추론 분야에서 인상적인 진전을 이루었음에도 불구하고, 현재의 대형 언어 모델(LLM)은 일반적으로 고립된 상태로 작동합니다. 각 문제를 독립적인 시도로 취급하며, 경험적 지식을 축적하거나 통합하지 않습니다. 이와 대조적으로, 올림피아드나 프로그래밍 대회 팀과 같은 전문 문제 해결자들은 풍부한 경험의 망을 활용합니다: 코치로부터 멘토링을 받고, 과거 문제로부터 직관을 개발하며, 도구 사용 및 라이브러리 기능에 대한 지식을 활용하고, 동료의 전문성과 경험을 바탕으로 전략을 조정하며, 시행착오를 통해 추론을 지속적으로 개선하고, 경쟁 중에도 관련 문제로부터 배웁니다. 우리는 Xolver를 소개합니다. 이는 블랙박스 LLM에 전체적인 경험의 지속적이고 진화하는 메모리를 제공하는 훈련이 필요 없는 다중 에이전트 추론 프레임워크입니다. Xolver는 외부 및 자기 검색, 도구 사용, 협업적 상호작용, 에이전트 주도 평가, 반복적 개선 등 다양한 경험 양식을 통합합니다. 추론 시간에 관련 전략, 코드 조각, 추상적 추론 패턴을 학습함으로써, Xolver는 처음부터 해결책을 생성하는 것을 피합니다. 이는 고립된 추론에서 경험을 인지하는 언어 에이전트로의 전환을 의미합니다. 오픈 웨이트와 독점 모델 모두를 기반으로 구축된 Xolver는 특수화된 추론 에이전트를 꾸준히 능가합니다. 경량 백본(예: QWQ-32B)을 사용하더라도, Qwen3-235B, Gemini 2.5 Pro, o3, o4-mini-high와 같은 고급 모델을 종종 능가합니다. o3-mini-high를 사용하여 GSM8K(98.1%), AIME'24(94.4%), AIME'25(93.7%), Math-500(99.8%), LiveCodeBench-V5(91.6%)에서 새로운 최고 기록을 달성하며, 전문가 수준의 추론이 가능한 일반 에이전트로 나아가는 핵심 단계로서 전체적 경험 학습을 강조합니다. 코드와 데이터는 https://kagnlp.github.io/xolver.github.io/에서 확인할 수 있습니다.

English

Despite impressive progress on complex reasoning, current large language models (LLMs) typically operate in isolation - treating each problem as an independent attempt, without accumulating or integrating experiential knowledge. In contrast, expert problem solvers - such as Olympiad or programming contest teams - leverage a rich tapestry of experiences: absorbing mentorship from coaches, developing intuition from past problems, leveraging knowledge of tool usage and library functionality, adapting strategies based on the expertise and experiences of peers, continuously refining their reasoning through trial and error, and learning from other related problems even during competition. We introduce Xolver, a training-free multi-agent reasoning framework that equips a black-box LLM with a persistent, evolving memory of holistic experience. Xolver integrates diverse experience modalities, including external and self-retrieval, tool use, collaborative interactions, agent-driven evaluation, and iterative refinement. By learning from relevant strategies, code fragments, and abstract reasoning patterns at inference time, Xolver avoids generating solutions from scratch - marking a transition from isolated inference toward experience-aware language agents. Built on both open-weight and proprietary models, Xolver consistently outperforms specialized reasoning agents. Even with lightweight backbones (e.g., QWQ-32B), it often surpasses advanced models including Qwen3-235B, Gemini 2.5 Pro, o3, and o4-mini-high. With o3-mini-high, it achieves new best results on GSM8K (98.1%), AIME'24 (94.4%), AIME'25 (93.7%), Math-500 (99.8%), and LiveCodeBench-V5 (91.6%) - highlighting holistic experience learning as a key step toward generalist agents capable of expert-level reasoning. Code and data are available at https://kagnlp.github.io/xolver.github.io/.

Xolver: 올림피아드 팀처럼 전체적 경험 학습을 통한 다중 에이전트 추론

Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team

초록

Support