ChatPaper.aiChatPaper

SSL:面向智能体优化差异化引导的甜点学习策略

SSL: Sweet Spot Learning for Differentiated Guidance in Agentic Optimization

January 30, 2026
作者: Jinyang Wu, Changpeng Yang, Yuhao Shen, Fangzhi Xu, Bolin Ni, Chonghua Liao, Yuchen Liu, Hongzhen Wang, Shuai Nie, Shuai Zhang, Haoran Luo, Jiaming Xu
cs.AI

摘要

可验证奖励的强化学习已成为训练智能代理的强大范式。然而现有方法通常采用二元奖励机制,无法区分达成相同结果的不同轨迹质量,从而忽视了解空间内潜在的多样性。受网球"甜点"概念启发——即球拍能产生最佳击球效果的核心区域,我们提出甜点学习框架,为智能体优化提供差异化指导。该框架遵循一个简单而有效的原则:通过渐进式放大的分层奖励机制,引导策略趋向解空间的甜点区域。这一原则可自然适配多种任务:视觉感知任务利用距离分层建模奖励接近度,复杂推理任务则对向可行解决方案的渐进进展给予奖励。我们从理论上证明该框架能保持最优解的顺序性并提升梯度信噪比,从而实现更有导向性的优化。在GUI感知、短/长期规划和复杂推理等任务的广泛实验中,12个基准测试均显示其对强基线模型的持续改进,样本效率提升最高达2.5倍,并展现出有效的跨任务迁移能力。本研究确立了甜点学习作为训练高效鲁棒智能代理的通用原则。
English
Reinforcement learning with verifiable rewards has emerged as a powerful paradigm for training intelligent agents. However, existing methods typically employ binary rewards that fail to capture quality differences among trajectories achieving identical outcomes, thereby overlooking potential diversity within the solution space. Inspired by the ``sweet spot'' concept in tennis-the racket's core region that produces optimal hitting effects, we introduce Sweet Spot Learning (SSL), a novel framework that provides differentiated guidance for agent optimization. SSL follows a simple yet effective principle: progressively amplified, tiered rewards guide policies toward the sweet-spot region of the solution space. This principle naturally adapts across diverse tasks: visual perception tasks leverage distance-tiered modeling to reward proximity, while complex reasoning tasks reward incremental progress toward promising solutions. We theoretically demonstrate that SSL preserves optimal solution ordering and enhances the gradient signal-to-noise ratio, thereby fostering more directed optimization. Extensive experiments across GUI perception, short/long-term planning, and complex reasoning tasks show consistent improvements over strong baselines on 12 benchmarks, achieving up to 2.5X sample efficiency gains and effective cross-task transferability. Our work establishes SSL as a general principle for training capable and robust agents.
PDF112February 3, 2026