ChatPaper.aiChatPaper

DR-Venus:仅凭万条开放数据迈向边缘计算前沿的深度研究智能体

DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data

April 21, 2026
作者: Venus Team, Sunhao Dai, Yong Deng, Jinzhen Lin, Yusheng Song, Guoqing Wang, Xiaofeng Wu, Yuqi Zhou, Shuo Yang, Zhenzhe Ying, Zhanwei Zhang, Changhua Meng, Weiqiang Wang
cs.AI

摘要

基于小型语言模型的边缘侧深度研究智能体因其在成本、延迟和隐私方面的优势,在实际部署中极具吸引力。本研究探索如何在有限开放数据下,通过提升数据质量和数据利用率来训练强大的小型深度研究智能体。我们提出DR-Venus——一个完全基于开放数据构建、面向边缘部署的4B参数前沿深度研究智能体。训练方案包含两个阶段:第一阶段采用智能体监督微调(SFT),通过严格数据清洗与长周期轨迹重采样相结合的策略建立基础智能体能力,同步提升数据质量与利用率;第二阶段应用智能体强化学习(RL),通过基于信息增益的回合级奖励和格式感知正则化设计(基于IGPO改进),增强长周期深度研究任务的执行可靠性,从而提升监督密度并优化回合级信用分配。完全基于约1万条开放数据构建的DR-Venus-4B,在多个深度研究基准测试中显著优于9B参数以下的现有智能体模型,同时大幅缩小了与30B级大型系统的差距。进一步分析表明,4B智能体已展现出惊人的性能潜力,既印证了小型模型的部署前景,也凸显了该场景下测试时扩展的价值。我们公开模型、代码及核心配方,以支持边缘侧深度研究智能体的可复现研究。
English
Edge-scale deep research agents based on small language models are attractive for real-world deployment due to their advantages in cost, latency, and privacy. In this work, we study how to train a strong small deep research agent under limited open-data by improving both data quality and data utilization. We present DR-Venus, a frontier 4B deep research agent for edge-scale deployment, built entirely on open data. Our training recipe consists of two stages. In the first stage, we use agentic supervised fine-tuning (SFT) to establish basic agentic capability, combining strict data cleaning with resampling of long-horizon trajectories to improve data quality and utilization. In the second stage, we apply agentic reinforcement learning (RL) to further improve execution reliability on long-horizon deep research tasks. To make RL effective for small agents in this setting, we build on IGPO and design turn-level rewards based on information gain and format-aware regularization, thereby enhancing supervision density and turn-level credit assignment. Built entirely on roughly 10K open-data, DR-Venus-4B significantly outperforms prior agentic models under 9B parameters on multiple deep research benchmarks, while also narrowing the gap to much larger 30B-class systems. Our further analysis shows that 4B agents already possess surprisingly strong performance potential, highlighting both the deployment promise of small models and the value of test-time scaling in this setting. We release our models, code, and key recipes to support reproducible research on edge-scale deep research agents.
PDF382April 24, 2026