에이전트 데이터 분석을 위한 비지도 스킬 발견

초록

추론 시점의 스킬 증강은 모델 파라미터를 업데이트하지 않고 재사용 가능한 절차적 지식을 주입함으로써 데이터 분석 에이전트를 개선하는 경량화된 방법을 제공한다. 그러나 데이터 분석을 위한 효과적인 스킬을 발견하는 것은 여전히 어려운데, 이는 신뢰할 수 있는 지도 신호가 비용이 많이 들고 성공 기준이 분석 형식에 따라 다양하기 때문이다. 이는 레이블이 없는 탐색만으로 재사용 가능한 데이터 분석 스킬을 어떻게 발견할 것인가라는 핵심 질문을 제기한다. 우리는 데이터 분석 에이전트를 위한 비지도 검증기 기반 스킬 발견 프레임워크인 DataCOPE를 제안한다. DataCOPE는 탐색 궤적으로부터 검증기 신호를 도출하고 이를 사용하여 궤적들 간의 상대적 품질이나 일치성을 특성화한다. 궤적 생성을 위한 데이터 분석 에이전트, 신호 추출을 위한 비지도 검증기, 그리고 대조적 스킬 증류를 위한 스킬 관리자를 반복적으로 조정한다. 보고서 스타일 분석을 위해 검증기를 적응형 체크리스트 검증기로 구체화하여 작업별 기준을 도출하고, 검증 가능한 범위로 보고서를 점수화하며, 체크리스트를 반복적으로 개선한다. 추론 스타일 분석을 위해 검증기를 답변 일치 검증기로 구체화하여 답변 일치에 따라 궤적을 그룹화하고 자기 일관성을 보조 신호로 사용한다. 우리는 DataCOPE를 Deep Data Research의 보고서 스타일 분석과 DABStep의 추론 스타일 분석에 대해 평가한다. 두 설정 모두에서 DataCOPE는 기준선 대비 보류된 성능을 일관되게 개선한다. 네 가지 모델 설정에 걸쳐 평균적으로 DataCOPE는 보고서 스타일 작업에서 평균 점수를 9.71%, 추론 스타일 작업에서 32.30% 향상시킨다.

English

Inference-time skill augmentation provides a lightweight way to improve data-analytic agents by injecting reusable procedural knowledge without updating model parameters. However, discovering effective skills for data analysis remains challenging, as reliable supervision is expensive and success criteria vary across analytical formats. This raises the key question of how to discover reusable data-analysis skills from unlabeled exploration alone. We propose DataCOPE, an unsupervised verifier-guided skill discovery framework for data-analytic agents. DataCOPE derives verifier signals from the exploration trajectories and uses them to characterize relative quality or aggreement among trajectories. It iteratively coordinates a Data-Analytic Agent for trajectory generation, an Unsupervised Verifier for signal extraction, and a Skill Manager for contrastive skill distillation. For report-style analysis, we instantiate the verifier as an Adaptive Checklist Verifier that derives task-specific criteria, scores reports by verifiable coverage, and iteratively refines the checklist. For reasoning-style analysis, we instantiate it as an Answer Agreement Verifier that groups trajectories by answer agreement and uses self-consistency as an auxiliary signal. We evaluate DataCOPE on report-style analysis from Deep Data Research and reasoning-style analysis from DABStep. Across both settings, DataCOPE consistently improves held-out performance over baselines. Averaged across four model settings, DataCOPE improves the mean score by 9.71% and 32.30% on report-style and reasoning-style tasks respectively.