RedAct：遮蔽智能体能力痕迹以保护程序性技能

摘要

用户依赖执行轨迹来观察智能体行为、诊断故障并确保可问责性。这些轨迹包含丰富的流程细节，包括工具调用、中间决策和错误恢复逻辑。然而，这些细节可能暴露私有的流程技能，使得下游方法能够在无需访问模型权重或技能文件的情况下，恢复关键公式、阈值和策略。为量化此风险并评估防护措施，我们构建了CapTraceBench基准测试，包含75个专业长周期任务和横跨七个领域的154个精心设计的技能。我们还提出了RedAct（https://github.com/XuShuwenn/RedAct）——一种受保护的轨迹发布框架，该框架能够定位受保护的关键信息，在保留验证器关键证据的同时改写轨迹，并为下游溯源分析嵌入行为水印。在典型的轨迹复用方法中，RedAct将归一化技能转移（NST）从原始轨迹的44.7%至67.1%降至低于无技能基线，同时保留审计证据。其独立的动态水印在假警报率最高仅1.9%的情况下，实现了93.6%至100.0%的真实检测率。这些结果将公共智能体轨迹视为安全接口，并表明选择性编辑能够在不移除审计证据的前提下减少流程技能泄露。

English

Users rely on execution traces to observe agent behavior, diagnose failures, and ensure accountability. These traces contain rich procedural detail, including tool invocations, intermediate decisions, and error-recovery logic. Yet this detail can expose private procedural skills, allowing downstream methods to recover key formulas, thresholds, and strategies without access to model weights or skill files. To quantify this risk and evaluate protection, we construct CapTraceBench, a benchmark of 75 specialized long-horizon tasks and 154 curated skills across seven domains. We also introduce RedAct https://github.com/XuShuwenn/RedAct, a protected trace release framework that localizes protected key information, rewrites traces while preserving verifier-critical evidence, and embeds behavioral watermarks for downstream provenance analysis. Across representative trace reuse methods, RedAct reduces normalized skill transfer (NST) from 44.7--67.1\% on raw traces to below the no-skill baseline, while preserving audit evidence. Its standalone behavioral watermarks reach 93.6--100.0\% true detection with a false alarm rate of at most 1.9\%. These results frame public agent traces as security interfaces and show that selective redaction can reduce procedural capability leakage without removing audit evidence.