AceSearcher: 강화된 자기 대결을 통해 LLM의 추론 및 검색 기능 부트스트래핑

초록

검색 강화 대형 언어 모델(LLM)은 종종 비효율적인 다중 홉 검색과 제한된 추론 능력으로 인해 복잡한 추론 작업에 어려움을 겪습니다. 우리는 AceSearcher를 제안합니다. 이는 단일 대형 언어 모델(LLM)이 두 가지 역할을 번갈아 수행하도록 훈련하는 협력적 자가 플레이 프레임워크로, 복잡한 질의를 분해하는 분해자(decomposer)와 검색된 맥락을 통합하여 답변을 생성하는 해결자(solver) 역할을 합니다. AceSearcher는 검색, 추론, 분해 작업의 다양한 혼합 데이터에 대한 지도 미세 조정과 최종 답변 정확도를 최적화한 강화 미세 조정을 결합하여 중간 주석의 필요성을 없앱니다. 10개의 데이터셋에 걸친 세 가지 추론 집중 작업에 대한 광범위한 실험에서 AceSearcher는 최첨단 베이스라인을 능가하며 평균 정확도 일치율에서 7.6%의 향상을 달성했습니다. 특히, 문서 수준의 금융 추론 작업에서 AceSearcher-32B는 DeepSeek-V3 모델의 성능을 매칭하면서도 매개변수의 5% 미만을 사용했습니다. 더 작은 규모(1.5B 및 8B)에서도 AceSearcher는 종종 기존 검색 강화 LLM을 능가하며, 최대 9배 더 많은 매개변수를 가진 모델보다 우수한 성능을 보여 복잡한 추론 작업에 대한 탁월한 효율성과 효과성을 입증했습니다. 우리의 코드는 https://github.com/ritaranx/AceSearcher와 https://huggingface.co/AceSearcher에서 공개될 예정입니다.

English

Search-augmented LLMs often struggle with complex reasoning tasks due to ineffective multi-hop retrieval and limited reasoning ability. We propose AceSearcher, a cooperative self-play framework that trains a single large language model (LLM) to alternate between two roles: a decomposer that breaks down complex queries and a solver that integrates retrieved contexts for answer generation. AceSearcher couples supervised fine-tuning on a diverse mixture of search, reasoning, and decomposition tasks with reinforcement fine-tuning optimized for final answer accuracy, eliminating the need for intermediate annotations. Extensive experiments on three reasoning-intensive tasks across 10 datasets show that AceSearcher outperforms state-of-the-art baselines, achieving an average exact match improvement of 7.6%. Remarkably, on document-level finance reasoning tasks, AceSearcher-32B matches the performance of the DeepSeek-V3 model using less than 5% of its parameters. Even at smaller scales (1.5B and 8B), AceSearcher often surpasses existing search-augmented LLMs with up to 9x more parameters, highlighting its exceptional efficiency and effectiveness in tackling complex reasoning tasks. Our code will be published at https://github.com/ritaranx/AceSearcher and https://huggingface.co/AceSearcher.

AceSearcher: 강화된 자기 대결을 통해 LLM의 추론 및 검색 기능 부트스트래핑

AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play

초록

Support