面向智能体数据分析的无监督技能发现

摘要

推理时技能增强提供了一种轻量级的方法，通过注入可复用的程序性知识来改进数据分析智能体，而无需更新模型参数。然而，在数据分析中发掘有效的技能仍然具有挑战性，因为可靠的监督信号成本高昂，且成功标准因分析格式而异。这引出了一个关键问题：如何仅通过无标注探索来发现可复用的数据分析技能。我们提出DataCOPE，一种面向数据分析智能体的无监督验证器引导的技能发现框架。DataCOPE从探索轨迹中推导出验证器信号，并利用这些信号刻画轨迹之间的相对质量或一致性。它通过迭代协调三个组件：用于轨迹生成的数据分析智能体、用于信号提取的无监督验证器，以及用于对比技能蒸馏的技能管理器。针对报告式分析，我们将验证器实例化为自适应检查表验证器，该验证器推导出任务特定标准，根据可验证的覆盖率为报告评分，并迭代优化检查表。针对推理式分析，我们将其实例化为答案一致性验证器，该验证器根据答案一致性对轨迹进行分组，并将自一致性作为辅助信号。我们在Deep Data Research的报告式分析和DABStep的推理式分析上评估了DataCOPE。在两种设定下，DataCOPE在留出性能上均持续优于基线方法。在四种模型设定下取平均，DataCOPE在报告式任务和推理式任务上的平均得分分别提升了9.71%和32.30%。

English

Inference-time skill augmentation provides a lightweight way to improve data-analytic agents by injecting reusable procedural knowledge without updating model parameters. However, discovering effective skills for data analysis remains challenging, as reliable supervision is expensive and success criteria vary across analytical formats. This raises the key question of how to discover reusable data-analysis skills from unlabeled exploration alone. We propose DataCOPE, an unsupervised verifier-guided skill discovery framework for data-analytic agents. DataCOPE derives verifier signals from the exploration trajectories and uses them to characterize relative quality or aggreement among trajectories. It iteratively coordinates a Data-Analytic Agent for trajectory generation, an Unsupervised Verifier for signal extraction, and a Skill Manager for contrastive skill distillation. For report-style analysis, we instantiate the verifier as an Adaptive Checklist Verifier that derives task-specific criteria, scores reports by verifiable coverage, and iteratively refines the checklist. For reasoning-style analysis, we instantiate it as an Answer Agreement Verifier that groups trajectories by answer agreement and uses self-consistency as an auxiliary signal. We evaluate DataCOPE on report-style analysis from Deep Data Research and reasoning-style analysis from DABStep. Across both settings, DataCOPE consistently improves held-out performance over baselines. Averaged across four model settings, DataCOPE improves the mean score by 9.71% and 32.30% on report-style and reasoning-style tasks respectively.