赢得剪枝博弈：一种联合样本与令牌剪枝的统一方法，助力高效监督微调

摘要

随着监督微调（SFT）从轻量级的后训练步骤演变为计算密集型阶段，其规模已可与中期训练相媲美，数据效率在预算紧张的情况下对齐大型语言模型（LLMs）变得至关重要。现有的数据剪枝方法存在设计上的割裂：它们要么仅在样本层面操作，要么仅在标记层面单独进行，未能同时优化这两个维度。这种脱节导致了显著的效率低下——高价值样本中可能仍包含冗余标记，而标记层面的剪枝往往丢弃了嵌入在单个示例中的关键指导或校正信号。为解决这一瓶颈，我们引入了误差-不确定性（EU）平面，这是一个诊断框架，能够联合表征训练数据在样本和标记层面的异质效用。基于这一洞见，我们提出了基于象限的调优（Q-Tuning），这是一个统一框架，战略性地协调样本剪枝和标记剪枝。Q-Tuning采用两阶段策略：首先，进行样本级别的筛选，保留富含信息性误解或校准信号的示例；其次，应用非对称标记剪枝策略，利用上下文感知的评分机制，仅从误解样本中修剪不太显著的标记，同时完整保留校准样本。我们的方法在五个多样化基准测试中确立了新的技术标杆。值得注意的是，在SmolLM2-1.7B上，Q-Tuning仅使用原始训练数据的12.5%，就实现了比全数据SFT基线平均提升38%的效果。作为首个动态剪枝方法，Q-Tuning在持续超越全数据训练的同时，为预算受限的LLM SFT提供了实用且可扩展的数据利用最大化蓝图。

English

As supervised fine-tuning (SFT) evolves from a lightweight post-training step into a compute-intensive phase rivaling mid-training in scale, data efficiency has become critical for aligning large language models (LLMs) under tight budgets. Existing data pruning methods suffer from a fragmented design: they operate either at the sample level or the token level in isolation, failing to jointly optimize both dimensions. This disconnect leads to significant inefficiencies--high-value samples may still contain redundant tokens, while token-level pruning often discards crucial instructional or corrective signals embedded in individual examples. To address this bottleneck, we introduce the Error-Uncertainty (EU) Plane, a diagnostic framework that jointly characterizes the heterogeneous utility of training data across samples and tokens. Guided by this insight, we propose Quadrant-based Tuning (Q-Tuning), a unified framework that strategically coordinates sample pruning and token pruning. Q-Tuning employs a two-stage strategy: first, it performs sample-level triage to retain examples rich in informative misconceptions or calibration signals; second, it applies an asymmetric token-pruning policy, using a context-aware scoring mechanism to trim less salient tokens exclusively from misconception samples while preserving calibration samples in their entirety. Our method sets a new state of the art across five diverse benchmarks. Remarkably, on SmolLM2-1.7B, Q-Tuning achieves a +38\% average improvement over the full-data SFT baseline using only 12.5\% of the original training data. As the first dynamic pruning approach to consistently outperform full-data training, Q-Tuning provides a practical and scalable blueprint for maximizing data utilization in budget-constrained LLM SFT.

赢得剪枝博弈：一种联合样本与令牌剪枝的统一方法，助力高效监督微调

Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning

摘要

Support