無監督技能發現用於代理式數據分析

摘要

推理階段技能增強提供了一種輕量化方式，透過注入可重複使用的程序性知識來改善數據分析代理，無需更新模型參數。然而，為數據分析發現有效技能仍具挑戰性，因為可靠的監督成本高昂，且成功標準因分析格式而異。這引出了一個關鍵問題：如何在僅有未標記探索的情況下，發現可重複使用的數據分析技能。我們提出DataCOPE，一個無監督的驗證器引導技能發現框架，專為數據分析代理設計。DataCOPE從探索軌跡中提取驗證器信號，並用以表徵軌跡間的相對品質或一致性。它迭代協調三個組件：用於軌跡生成的數據分析代理、用於信號提取的無監督驗證器，以及用於對比技能萃取的技能管理器。針對報告式分析，我們將驗證器實例化為自適應檢查清單驗證器，其能推導任務特定標準、根據可驗證覆蓋率評分報告，並迭代優化檢查清單。針對推理式分析，我們將其實例化為答案一致性驗證器，該驗證器根據答案的一致性對軌跡進行分組，並將自我一致性作為輔助信號。我們在Deep Data Research的報告式分析與DABStep的推理式分析上評估DataCOPE。在這兩種設定中，DataCOPE均一致地提升了對基線方法的保留測試表現。平均而言，在四種模型設定下，DataCOPE在報告式與推理式任務上分別將平均分數提升了9.71%與32.30%。

English

Inference-time skill augmentation provides a lightweight way to improve data-analytic agents by injecting reusable procedural knowledge without updating model parameters. However, discovering effective skills for data analysis remains challenging, as reliable supervision is expensive and success criteria vary across analytical formats. This raises the key question of how to discover reusable data-analysis skills from unlabeled exploration alone. We propose DataCOPE, an unsupervised verifier-guided skill discovery framework for data-analytic agents. DataCOPE derives verifier signals from the exploration trajectories and uses them to characterize relative quality or aggreement among trajectories. It iteratively coordinates a Data-Analytic Agent for trajectory generation, an Unsupervised Verifier for signal extraction, and a Skill Manager for contrastive skill distillation. For report-style analysis, we instantiate the verifier as an Adaptive Checklist Verifier that derives task-specific criteria, scores reports by verifiable coverage, and iteratively refines the checklist. For reasoning-style analysis, we instantiate it as an Answer Agreement Verifier that groups trajectories by answer agreement and uses self-consistency as an auxiliary signal. We evaluate DataCOPE on report-style analysis from Deep Data Research and reasoning-style analysis from DABStep. Across both settings, DataCOPE consistently improves held-out performance over baselines. Averaged across four model settings, DataCOPE improves the mean score by 9.71% and 32.30% on report-style and reasoning-style tasks respectively.