프루닝 게임에서 승리하기: 효율적인 지도 미세 조정을 위한 샘플 및 토큰 공동 프루닝 통합 접근법

초록

지도 미세 조정(SFT)이 경량의 사후 학습 단계에서 중간 규모의 학습 단계에 필적하는 계산 집약적인 단계로 진화함에 따라, 제한된 예산 하에서 대규모 언어 모델(LLM)을 정렬하기 위한 데이터 효율성이 중요해졌습니다. 기존의 데이터 정제 방법은 단편적인 설계로 인해 샘플 수준이나 토큰 수준에서만 독립적으로 작동하여 두 차원을 동시에 최적화하지 못합니다. 이러한 단절은 상당한 비효율성을 초래합니다—고가치 샘플에는 여전히 중복 토큰이 포함될 수 있으며, 토큰 수준 정제는 종종 개별 예제에 내재된 중요한 지시 또는 교정 신호를 제거합니다. 이러한 병목 현상을 해결하기 위해, 우리는 훈련 데이터의 이질적 유용성을 샘플과 토큰 차원에서 공동으로 특성화하는 진단 프레임워크인 오류-불확실성(EU) 평면을 소개합니다. 이러한 통찰을 바탕으로, 우리는 샘플 정제와 토큰 정제를 전략적으로 조율하는 통합 프레임워크인 사분면 기반 튜닝(Q-Tuning)을 제안합니다. Q-Tuning은 두 단계 전략을 사용합니다: 먼저, 정보가 풍부한 오해 또는 교정 신호가 포함된 예제를 보존하기 위해 샘플 수준의 분류를 수행합니다; 둘째, 비대칭 토큰 정제 정책을 적용하여, 오해 샘플에서만 덜 중요한 토큰을 제거하는 반면 교정 샘플은 전체적으로 보존하는 컨텍스트 인식 점수 메커니즘을 사용합니다. 우리의 방법은 다섯 가지 다양한 벤치마크에서 새로운 최첨단 기술을 설정합니다. 특히, SmolLM2-1.7B에서 Q-Tuning은 원본 훈련 데이터의 12.5%만 사용하여 전체 데이터 SFT 기준선 대비 평균 +38%의 개선을 달성했습니다. 전체 데이터 훈련을 일관되게 능가하는 첫 번째 동적 정제 접근법으로서, Q-Tuning은 예산이 제한된 LLM SFT에서 데이터 활용을 극대화하기 위한 실용적이고 확장 가능한 청사진을 제공합니다.

English

As supervised fine-tuning (SFT) evolves from a lightweight post-training step into a compute-intensive phase rivaling mid-training in scale, data efficiency has become critical for aligning large language models (LLMs) under tight budgets. Existing data pruning methods suffer from a fragmented design: they operate either at the sample level or the token level in isolation, failing to jointly optimize both dimensions. This disconnect leads to significant inefficiencies--high-value samples may still contain redundant tokens, while token-level pruning often discards crucial instructional or corrective signals embedded in individual examples. To address this bottleneck, we introduce the Error-Uncertainty (EU) Plane, a diagnostic framework that jointly characterizes the heterogeneous utility of training data across samples and tokens. Guided by this insight, we propose Quadrant-based Tuning (Q-Tuning), a unified framework that strategically coordinates sample pruning and token pruning. Q-Tuning employs a two-stage strategy: first, it performs sample-level triage to retain examples rich in informative misconceptions or calibration signals; second, it applies an asymmetric token-pruning policy, using a context-aware scoring mechanism to trim less salient tokens exclusively from misconception samples while preserving calibration samples in their entirety. Our method sets a new state of the art across five diverse benchmarks. Remarkably, on SmolLM2-1.7B, Q-Tuning achieves a +38\% average improvement over the full-data SFT baseline using only 12.5\% of the original training data. As the first dynamic pruning approach to consistently outperform full-data training, Q-Tuning provides a practical and scalable blueprint for maximizing data utilization in budget-constrained LLM SFT.

프루닝 게임에서 승리하기: 효율적인 지도 미세 조정을 위한 샘플 및 토큰 공동 프루닝 통합 접근법

Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning

초록

Support