ChatPaper.aiChatPaper

DR-Venus:僅用萬級開放數據實現前沿邊緣規模深度研究代理

DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data

April 21, 2026
作者: Venus Team, Sunhao Dai, Yong Deng, Jinzhen Lin, Yusheng Song, Guoqing Wang, Xiaofeng Wu, Yuqi Zhou, Shuo Yang, Zhenzhe Ying, Zhanwei Zhang, Changhua Meng, Weiqiang Wang
cs.AI

摘要

基於小型語言模型的邊緣端深度研究代理器,憑藉其在成本、延遲和隱私方面的優勢,在現實世界部署中具有吸引力。本研究探討如何在有限開放數據下,通過提升數據質量和數據利用率來訓練強大的小型深度研究代理器。我們提出DR-Venus——專為邊緣端部署設計的前沿40億參數深度研究代理器,完全基於開放數據構建。我們的訓練方案包含兩個階段:第一階段採用代理式監督微調建立基礎代理能力,結合嚴格數據清洗與長週期軌跡重採樣來提升數據質量與利用率;第二階段應用代理式強化學習,進一步提升長週期深度研究任務的執行可靠性。為使強化學習在該設定下對小型代理器有效,我們基於IGPO框架設計了基於信息增益的回合級獎勵與格式感知正則化,從而增強監督密度與回合級信用分配。DR-Venus-4B完全基於約1萬條開放數據構建,在多個深度研究基準測試中顯著優於90億參數以下的既有代理模型,同時縮小了與300億參數級大型系統的差距。進一步分析表明,40億參數的代理器已具備驚人的性能潛力,這既凸顯了小模型的部署前景,也驗證了該設定下測試時擴展的價值。我們公開模型、代碼與核心方案,以支持邊緣端深度研究代理器的可重現研究。
English
Edge-scale deep research agents based on small language models are attractive for real-world deployment due to their advantages in cost, latency, and privacy. In this work, we study how to train a strong small deep research agent under limited open-data by improving both data quality and data utilization. We present DR-Venus, a frontier 4B deep research agent for edge-scale deployment, built entirely on open data. Our training recipe consists of two stages. In the first stage, we use agentic supervised fine-tuning (SFT) to establish basic agentic capability, combining strict data cleaning with resampling of long-horizon trajectories to improve data quality and utilization. In the second stage, we apply agentic reinforcement learning (RL) to further improve execution reliability on long-horizon deep research tasks. To make RL effective for small agents in this setting, we build on IGPO and design turn-level rewards based on information gain and format-aware regularization, thereby enhancing supervision density and turn-level credit assignment. Built entirely on roughly 10K open-data, DR-Venus-4B significantly outperforms prior agentic models under 9B parameters on multiple deep research benchmarks, while also narrowing the gap to much larger 30B-class systems. Our further analysis shows that 4B agents already possess surprisingly strong performance potential, highlighting both the deployment promise of small models and the value of test-time scaling in this setting. We release our models, code, and key recipes to support reproducible research on edge-scale deep research agents.
PDF382April 24, 2026