贏得剪枝博弈：一種聯合樣本與標記剪枝的統一方法，用於高效監督微調

摘要

隨著監督式微調（SFT）從輕量級的後訓練步驟演變為與中期訓練規模相當的計算密集型階段，數據效率在預算有限的情況下對齊大型語言模型（LLM）變得至關重要。現有的數據修剪方法存在設計上的碎片化問題：它們要麼僅在樣本層面操作，要麼僅在詞元層面操作，未能同時優化這兩個維度。這種割裂導致了顯著的效率低下——高價值樣本可能仍包含冗餘詞元，而詞元層面的修剪往往會丟失嵌入在個別樣本中的關鍵指令或校正信號。為解決這一瓶頸，我們引入了誤差-不確定性（EU）平面，這是一個診斷框架，能夠聯合表徵訓練數據在樣本和詞元層面的異質效用。基於這一洞察，我們提出了基於象限的微調（Q-Tuning），這是一個統一框架，能夠策略性地協調樣本修剪和詞元修剪。Q-Tuning採用兩階段策略：首先，進行樣本層面的分類，保留富含信息性誤解或校準信號的樣本；其次，應用非對稱的詞元修剪策略，使用上下文感知的評分機制，僅從誤解樣本中修剪較不顯著的詞元，同時完整保留校準樣本。我們的方法在五個多樣化的基準測試中創下了新的技術水平。值得注意的是，在SmolLM2-1.7B上，Q-Tuning僅使用12.5%的原始訓練數據，就實現了相較於全數據SFT基線平均+38%的提升。作為首個持續超越全數據訓練的動態修剪方法，Q-Tuning為在預算受限的LLM SFT中最大化數據利用提供了一個實用且可擴展的藍圖。

English

As supervised fine-tuning (SFT) evolves from a lightweight post-training step into a compute-intensive phase rivaling mid-training in scale, data efficiency has become critical for aligning large language models (LLMs) under tight budgets. Existing data pruning methods suffer from a fragmented design: they operate either at the sample level or the token level in isolation, failing to jointly optimize both dimensions. This disconnect leads to significant inefficiencies--high-value samples may still contain redundant tokens, while token-level pruning often discards crucial instructional or corrective signals embedded in individual examples. To address this bottleneck, we introduce the Error-Uncertainty (EU) Plane, a diagnostic framework that jointly characterizes the heterogeneous utility of training data across samples and tokens. Guided by this insight, we propose Quadrant-based Tuning (Q-Tuning), a unified framework that strategically coordinates sample pruning and token pruning. Q-Tuning employs a two-stage strategy: first, it performs sample-level triage to retain examples rich in informative misconceptions or calibration signals; second, it applies an asymmetric token-pruning policy, using a context-aware scoring mechanism to trim less salient tokens exclusively from misconception samples while preserving calibration samples in their entirety. Our method sets a new state of the art across five diverse benchmarks. Remarkably, on SmolLM2-1.7B, Q-Tuning achieves a +38\% average improvement over the full-data SFT baseline using only 12.5\% of the original training data. As the first dynamic pruning approach to consistently outperform full-data training, Q-Tuning provides a practical and scalable blueprint for maximizing data utilization in budget-constrained LLM SFT.

贏得剪枝博弈：一種聯合樣本與標記剪枝的統一方法，用於高效監督微調

Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning

摘要

Support