エージェント的データ分析のための教師なしスキル発見

要旨

推論時のスキル拡張は、モデルパラメータを更新することなく再利用可能な手続き的知識を注入することで、データ分析エージェントを軽量に改善する方法を提供する。しかし、データ分析に有効なスキルを発見することは依然として困難であり、信頼性の高い教師信号はコストがかかり、成功基準は分析形式によって異なる。このことは、ラベルなしの探索のみから再利用可能なデータ分析スキルをいかに発見するかという重要な問いを提起する。本稿では、データ分析エージェントのための教師なし検証器誘導型スキル発見フレームワークであるDataCOPEを提案する。DataCOPEは、探索軌跡から検証器の信号を導出し、それらを用いて軌跡間の相対的な品質や一致度を特徴付ける。そして、軌跡生成を行うデータ分析エージェント、信号抽出を行う教師なし検証器、そして対照的なスキル蒸留を行うスキルマネージャーを反復的に協調させる。レポート形式分析に対しては、検証器を適応的チェックリスト検証器として具体化し、タスク固有の基準を導出し、検証可能なカバレッジによってレポートをスコアリングし、チェックリストを反復的に洗練する。推論形式分析に対しては、検証器を回答一致検証器として具体化し、回答の一致に基づいて軌跡をグループ化し、自己整合性を補助信号として利用する。我々は、Deep Data Researchによるレポート形式分析とDABStepによる推論形式分析の両方でDataCOPEを評価する。両設定において、DataCOPEはベースラインを上回る未評価データでの性能を一貫して改善する。4つのモデル設定での平均では、DataCOPEはレポート形式タスクで平均スコアを9.71%、推論形式タスクで32.30%向上させる。

English

Inference-time skill augmentation provides a lightweight way to improve data-analytic agents by injecting reusable procedural knowledge without updating model parameters. However, discovering effective skills for data analysis remains challenging, as reliable supervision is expensive and success criteria vary across analytical formats. This raises the key question of how to discover reusable data-analysis skills from unlabeled exploration alone. We propose DataCOPE, an unsupervised verifier-guided skill discovery framework for data-analytic agents. DataCOPE derives verifier signals from the exploration trajectories and uses them to characterize relative quality or aggreement among trajectories. It iteratively coordinates a Data-Analytic Agent for trajectory generation, an Unsupervised Verifier for signal extraction, and a Skill Manager for contrastive skill distillation. For report-style analysis, we instantiate the verifier as an Adaptive Checklist Verifier that derives task-specific criteria, scores reports by verifiable coverage, and iteratively refines the checklist. For reasoning-style analysis, we instantiate it as an Answer Agreement Verifier that groups trajectories by answer agreement and uses self-consistency as an auxiliary signal. We evaluate DataCOPE on report-style analysis from Deep Data Research and reasoning-style analysis from DABStep. Across both settings, DataCOPE consistently improves held-out performance over baselines. Averaged across four model settings, DataCOPE improves the mean score by 9.71% and 32.30% on report-style and reasoning-style tasks respectively.