SFR-深度研究:迈向自主推理单智能体的高效强化学习
SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents
September 8, 2025
作者: Xuan-Phi Nguyen, Shrey Pandit, Revanth Gangi Reddy, Austin Xu, Silvio Savarese, Caiming Xiong, Shafiq Joty
cs.AI
摘要
为大型语言模型(LLMs)配备复杂的交错推理与工具使用能力,已成为智能体AI研究的一个关键焦点,尤其是在推理导向型(“思考”)模型取得最新进展的背景下。这些能力对于解锁一系列重要应用至关重要。其中一项应用便是深度研究(DR),它要求对众多来源进行广泛的搜索与推理。本文的工作聚焦于开发具备最小化网络爬取与Python工具集成的原生自主单智能体模型,以应对DR任务。与多智能体系统中智能体承担预设角色、在静态工作流程中按部就班执行指令不同,自主单智能体能够根据上下文动态决定其下一步行动,无需人工指导。尽管先前的研究已提出了针对基础或指令调优LLMs的训练方案,我们则专注于通过持续强化学习(RL)进一步优化推理模型,以增强智能体技能的同时保持其推理能力。为此,我们提出了一种完全基于合成数据的简单RL方案,并将其应用于多种开源LLMs。我们最佳变体SFR-DR-20B在“人类终极考试”基准测试中取得了高达28.7%的成绩。此外,我们还进行了关键分析实验,以深入理解我们的方法论。
English
Equipping large language models (LLMs) with complex, interleaved reasoning
and tool-use capabilities has become a key focus in agentic AI research,
especially with recent advances in reasoning-oriented (``thinking'') models.
Such capabilities are key to unlocking a number of important applications. One
such application is Deep Research (DR), which requires extensive search and
reasoning over many sources. Our work in this paper focuses on the development
of native Autonomous Single-Agent models for DR featuring minimal web crawling
and Python tool integration. Unlike multi-agent systems, where agents take up
pre-defined roles and are told what to do at each step in a static workflow, an
autonomous single-agent determines its next action dynamically based on
context, without manual directive. While prior work has proposed training
recipes for base or instruction-tuned LLMs, we focus on continual reinforcement
learning (RL) of reasoning-optimized models to further enhance agentic skills
while preserving reasoning ability. Towards this end, we propose a simple RL
recipe with entirely synthetic data, which we apply to various open-source
LLMs. Our best variant SFR-DR-20B achieves up to 28.7% on Humanity's Last Exam
benchmark. In addition, we conduct key analysis experiments to provide more
insights into our methodologies.