ChatPaper.aiChatPaper

不确定性感知的梯度信噪比数据选择在指令调优中的应用

Uncertainty-Aware Gradient Signal-to-Noise Data Selection for Instruction Tuning

January 20, 2026
作者: Zhihang Yuan, Chengyu Yue, Long Huang, Litu Ou, Lei Shi
cs.AI

摘要

指令微调是适配大语言模型的标准范式,但现代指令数据集存在规模庞大、噪声显著且冗余度高等问题,导致全数据微调成本高昂且往往非必要。现有数据选择方法要么需构建高成本的梯度数据存储库,要么依赖弱代理模型分配静态评分,大多忽略了模型动态演进中的不确定性,因而缺失了理解大语言模型行为的关键维度。我们提出GRADFILTERING这一目标无关的不确定性感知数据选择框架:通过结合LoRA集成的小型GPT-2代理模型,将逐样本梯度聚合为梯度信噪比效用指标。在多数LLM-as-a-judge评估及人工评估中,本方法达到或超越了随机子集与强基线模型的表现。此外,在相同计算预算下,GRADFILTERING所选数据子集的收敛速度优于竞品筛选方法,印证了不确定性感知评分的优势。
English
Instruction tuning is a standard paradigm for adapting large language models (LLMs), but modern instruction datasets are large, noisy, and redundant, making full-data fine-tuning costly and often unnecessary. Existing data selection methods either build expensive gradient datastores or assign static scores from a weak proxy, largely ignoring evolving uncertainty, and thus missing a key source of LLM interpretability. We propose GRADFILTERING, an objective-agnostic, uncertainty-aware data selection framework that utilizes a small GPT-2 proxy with a LoRA ensemble and aggregates per-example gradients into a Gradient Signal-to-Noise Ratio (G-SNR) utility. Our method matches or surpasses random subsets and strong baselines in most LLM-as-a-judge evaluations as well as in human assessment. Moreover, GRADFILTERING-selected subsets converge faster than competitive filters under the same compute budget, reflecting the benefit of uncertainty-aware scoring.
PDF21January 22, 2026