信号：智能体交互中的轨迹采样与分级处理

摘要

基于大语言模型的智能体应用日益依赖包含规划、行动执行和环境反馈的多步骤交互循环。尽管此类系统已实现规模化部署，但部署后的改进仍面临挑战。智能体轨迹数据量庞大且具有非确定性，通过人工审核或辅助大语言模型对每条轨迹进行审查不仅效率低下且成本高昂。我们提出了一种基于信号的轻量级框架，用于对智能体交互轨迹进行分级筛选。该方法从实时交互中计算低成本、广泛适用的信号，并将其作为结构化属性附加至轨迹进行分级，从而在不影响在线智能体行为的前提下识别可能蕴含信息的交互。我们将信号组织成涵盖交互（错位、停滞、脱离、满意度）、执行（失败、循环）和环境（枯竭）的粗粒度分类体系，该体系设计无需模型调用即可完成计算。在τ-bench（广泛使用的工具增强型智能体评估基准）上进行的受控标注研究表明，基于信号的采样实现了82%的信息价值率，优于启发式过滤的74%和随机采样的54%，且每条信息轨迹的效率提升达1.52倍。该优势在不同奖励层级和任务领域均保持稳健，证实信号能真正提升单条轨迹的信息价值，而非仅过度采样明显失败案例。这些结果表明轻量级信号可作为智能体系统的实用采样基础设施，并为偏好数据构建与部署后优化指明了路径。

English

Agentic applications based on large language models increasingly rely on multi-step interaction loops involving planning, action execution, and environment feedback. While such systems are now deployed at scale, improving them post-deployment remains challenging. Agent trajectories are voluminous and non-deterministic, and reviewing each one, whether through human review or auxiliary LLMs, is slow and cost-prohibitive. We propose a lightweight, signal-based framework for triaging agentic interaction trajectories. Our approach computes cheap, broadly applicable signals from live interactions and attaches them as structured attributes for trajectory triage, identifying interactions likely to be informative without affecting online agent behavior. We organize signals into a coarse-grained taxonomy spanning interaction (misalignment, stagnation, disengagement, satisfaction), execution (failure, loop), and environment (exhaustion), designed for computation without model calls. In a controlled annotation study on τ-bench, a widely used benchmark for tool-augmented agent evaluation, we show that signal-based sampling achieves an 82\% informativeness rate compared to 74\% for heuristic filtering and 54\% for random sampling, with a 1.52x efficiency gain per informative trajectory. The advantage is robust across reward strata and task domains, confirming that signals provide genuine per-trajectory informativeness gains rather than merely oversampling obvious failures. These results show that lightweight signals can serve as practical sampling infrastructure for agentic systems, and suggest a path toward preference data construction and post-deployment optimization.

信号：智能体交互中的轨迹采样与分级处理

Signals: Trajectory Sampling and Triage for Agentic Interactions

摘要

Support