AgentCPM报告:开放式深度研究中的草拟与深化交替策略
AgentCPM-Report: Interleaving Drafting and Deepening for Open-Ended Deep Research
February 6, 2026
作者: Yishan Li, Wentong Chen, Yukun Yan, Mingwei Li, Sen Mei, Xiaorong Wang, Kunpeng Liu, Xin Cong, Shuo Wang, Zhong Zhang, Yaxi Lu, Zhenghao Liu, Yankai Lin, Zhiyuan Liu, Maosong Sun
cs.AI
摘要
生成深度研究报告需要大规模信息获取与洞察驱动的综合分析,这对当前语言模型构成重大挑战。现有方法大多遵循"先规划后撰写"范式,其性能高度依赖初始大纲质量。然而构建全面大纲本身需要强大的推理能力,导致现有深度研究系统几乎完全依赖闭源或在线大模型。这种依赖性不仅造成实际部署障碍,更对用户数据的隐私安全构成隐患。本研究提出AgentCPM-Report——一个轻量级高性能本地解决方案,包含模拟人类写作流程的框架与80亿参数深度研究智能体。该框架采用"写作即推理策略",使模型能在报告生成过程中动态修订大纲。在此策略下,智能体交替执行"证据驱动起草"与"推理驱动深化",共同支持信息获取、知识精炼及大纲迭代演进。为有效赋能小模型,我们提出包含冷启动、原子技能强化学习、全流程强化学习的多阶段智能体训练策略。在DeepResearch Bench、DeepConsult和DeepResearch Gym上的实验表明,AgentCPM-Report在洞察力指标上显著超越主流闭源系统。
English
Generating deep research reports requires large-scale information acquisition and the synthesis of insight-driven analysis, posing a significant challenge for current language models. Most existing approaches follow a plan-then-write paradigm, whose performance heavily depends on the quality of the initial outline. However, constructing a comprehensive outline itself demands strong reasoning ability, causing current deep research systems to rely almost exclusively on closed-source or online large models. This reliance raises practical barriers to deployment and introduces safety and privacy concerns for user-authored data. In this work, we present AgentCPM-Report, a lightweight yet high-performing local solution composed of a framework that mirrors the human writing process and an 8B-parameter deep research agent. Our framework uses a Writing As Reasoning Policy (WARP), which enables models to dynamically revise outlines during report generation. Under this policy, the agent alternates between Evidence-Based Drafting and Reasoning-Driven Deepening, jointly supporting information acquisition, knowledge refinement, and iterative outline evolution. To effectively equip small models with this capability, we introduce a Multi-Stage Agentic Training strategy, consisting of cold-start, atomic skill RL, and holistic pipeline RL. Experiments on DeepResearch Bench, DeepConsult, and DeepResearch Gym demonstrate that AgentCPM-Report outperforms leading closed-source systems, with substantial gains in Insight.