赢得剪枝博弈:一种联合样本与令牌剪枝的统一方法,助力高效监督微调
Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning
September 28, 2025
作者: Shaobo Wang, Jiaming Wang, Jiajun Zhang, Cong Wang, Yue Min, Zichen Wen, Fei Huang, Huiqiang Jiang, Junyang Lin, Dayiheng Liu, Linfeng Zhang
cs.AI
摘要
随着监督微调(SFT)从轻量级的后训练步骤演变为计算密集型阶段,其规模已可与中期训练相媲美,数据效率在预算紧张的情况下对齐大型语言模型(LLMs)变得至关重要。现有的数据剪枝方法存在设计上的割裂:它们要么仅在样本层面操作,要么仅在标记层面单独进行,未能同时优化这两个维度。这种脱节导致了显著的效率低下——高价值样本中可能仍包含冗余标记,而标记层面的剪枝往往丢弃了嵌入在单个示例中的关键指导或校正信号。为解决这一瓶颈,我们引入了误差-不确定性(EU)平面,这是一个诊断框架,能够联合表征训练数据在样本和标记层面的异质效用。基于这一洞见,我们提出了基于象限的调优(Q-Tuning),这是一个统一框架,战略性地协调样本剪枝和标记剪枝。Q-Tuning采用两阶段策略:首先,进行样本级别的筛选,保留富含信息性误解或校准信号的示例;其次,应用非对称标记剪枝策略,利用上下文感知的评分机制,仅从误解样本中修剪不太显著的标记,同时完整保留校准样本。我们的方法在五个多样化基准测试中确立了新的技术标杆。值得注意的是,在SmolLM2-1.7B上,Q-Tuning仅使用原始训练数据的12.5%,就实现了比全数据SFT基线平均提升38%的效果。作为首个动态剪枝方法,Q-Tuning在持续超越全数据训练的同时,为预算受限的LLM SFT提供了实用且可扩展的数据利用最大化蓝图。
English
As supervised fine-tuning (SFT) evolves from a lightweight post-training step
into a compute-intensive phase rivaling mid-training in scale, data efficiency
has become critical for aligning large language models (LLMs) under tight
budgets. Existing data pruning methods suffer from a fragmented design: they
operate either at the sample level or the token level in isolation, failing to
jointly optimize both dimensions. This disconnect leads to significant
inefficiencies--high-value samples may still contain redundant tokens, while
token-level pruning often discards crucial instructional or corrective signals
embedded in individual examples. To address this bottleneck, we introduce the
Error-Uncertainty (EU) Plane, a diagnostic framework that jointly characterizes
the heterogeneous utility of training data across samples and tokens. Guided by
this insight, we propose Quadrant-based Tuning (Q-Tuning), a unified framework
that strategically coordinates sample pruning and token pruning. Q-Tuning
employs a two-stage strategy: first, it performs sample-level triage to retain
examples rich in informative misconceptions or calibration signals; second, it
applies an asymmetric token-pruning policy, using a context-aware scoring
mechanism to trim less salient tokens exclusively from misconception samples
while preserving calibration samples in their entirety. Our method sets a new
state of the art across five diverse benchmarks. Remarkably, on SmolLM2-1.7B,
Q-Tuning achieves a +38\% average improvement over the full-data SFT baseline
using only 12.5\% of the original training data. As the first dynamic pruning
approach to consistently outperform full-data training, Q-Tuning provides a
practical and scalable blueprint for maximizing data utilization in
budget-constrained LLM SFT.