SmartSnap:面向自验证智能体的主动证据搜寻系统
SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents
December 26, 2025
作者: Shaofei Cai, Yulei Qin, Haojia Lin, Zihan Xu, Gang Li, Yuchen Shi, Zongyi Li, Yong Mao, Siqi Cai, Xiaoyu Tan, Yitao Liang, Ke Li, Xing Sun
cs.AI
摘要
代理强化学习(RL)在复杂图形用户界面任务下的自主智能体开发中前景广阔,但其可扩展性仍受限于任务完成验证环节。现有验证方法采用被动的事后处理模式:验证器(如基于规则的评分脚本、奖励/评判模型或LLM即法官)通过分析智能体完整交互轨迹来判断任务成败。这种处理包含无关噪声历史的冗长上下文的方式,不仅给验证机制带来挑战,更导致高昂成本与低可靠性。为突破此瓶颈,我们提出SmartSnap范式——将被动事后验证转变为智能体主动的现场自验证。我们引入具备双重使命的自验证智能体:不仅完成任务,更通过精心筛选的快照证据自我证明任务达成。基于我们提出的3C原则(完整性、简洁性、创造性),该智能体利用在线环境访问权限,对最小化关键快照集进行自验证。这些证据作为唯一材料提交给通用LLM法官验证器,用以判定其有效性与相关性。跨模型系列与规模的移动端任务实验表明,SmartSnap范式能以可扩展方式训练LLM驱动智能体,为80亿和300亿参数模型分别带来26.08%和16.66%的性能提升。解决方案寻找与证据探寻的协同作用,成功培育出性能媲美DeepSeek V3.1与Qwen3-235B-A22B的高效自验证智能体。
English
Agentic reinforcement learning (RL) holds great promise for the development of autonomous agents under complex GUI tasks, but its scalability remains severely hampered by the verification of task completion. Existing task verification is treated as a passive, post-hoc process: a verifier (i.e., rule-based scoring script, reward or critic model, and LLM-as-a-Judge) analyzes the agent's entire interaction trajectory to determine if the agent succeeds. Such processing of verbose context that contains irrelevant, noisy history poses challenges to the verification protocols and therefore leads to prohibitive cost and low reliability. To overcome this bottleneck, we propose SmartSnap, a paradigm shift from this passive, post-hoc verification to proactive, in-situ self-verification by the agent itself. We introduce the Self-Verifying Agent, a new type of agent designed with dual missions: to not only complete a task but also to prove its accomplishment with curated snapshot evidences. Guided by our proposed 3C Principles (Completeness, Conciseness, and Creativity), the agent leverages its accessibility to the online environment to perform self-verification on a minimal, decisive set of snapshots. Such evidences are provided as the sole materials for a general LLM-as-a-Judge verifier to determine their validity and relevance. Experiments on mobile tasks across model families and scales demonstrate that our SmartSnap paradigm allows training LLM-driven agents in a scalable manner, bringing performance gains up to 26.08% and 16.66% respectively to 8B and 30B models. The synergizing between solution finding and evidence seeking facilitates the cultivation of efficient, self-verifying agents with competitive performance against DeepSeek V3.1 and Qwen3-235B-A22B.