SFTMix:利用混合配方提升语言模型指导调整
SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe
October 7, 2024
作者: Yuxin Xiao, Shujian Zhang, Wenxuan Zhou, Marzyeh Ghassemi, Sanqiang Zhao
cs.AI
摘要
为了在大型语言模型(LLMs)中诱导期望的行为以进行交互驱动任务,指导调整阶段通常使用下一个标记预测(NTP)损失对LLMs进行训练,使用指导-响应对。以提高指导调整性能为目标的先前研究通常强调对更高质量的监督微调(SFT)数据集的需求,这通常涉及使用专有LLMs进行昂贵数据过滤或由人类注释者进行劳动密集型数据生成。然而,这些方法未充分利用数据集的内在属性,导致高计算和劳动成本,从而限制了可扩展性和性能增益。在本文中,我们提出了SFTMix,这是一种新颖的配方,可以在无需精心策划的数据集的情况下提升指导调整性能,超越传统的NTP范式。我们观察到LLMs在语义表示空间中表现出不均匀的置信度,认为在指导调整过程中,具有不同置信水平的示例应扮演不同角色。基于这一观点,SFTMix利用训练动态来识别具有不同置信水平的示例,然后应用基于Mixup的正则化来减轻对置信示例的过拟合,同时传播监督信号以改善对相对不自信示例的学习。这种方法使SFTMix能够在广泛的指令遵循和医疗保健领域特定的SFT任务中显著优于NTP,展示了其适应各种LLM系列和数据集大小的可扩展性。全面的消融研究进一步验证了SFTMix设计选择的稳健性,强调了其在不同LLMs和数据集中持续增强性能的多功能性,适用于更广泛的自然语言处理应用。
English
To induce desired behaviors in large language models (LLMs) for
interaction-driven tasks, the instruction-tuning stage typically trains LLMs on
instruction-response pairs using the next-token prediction (NTP) loss. Previous
work aiming to improve instruction-tuning performance often emphasizes the need
for higher-quality supervised fine-tuning (SFT) datasets, which typically
involves expensive data filtering with proprietary LLMs or labor-intensive data
generation by human annotators. However, these approaches do not fully leverage
the datasets' intrinsic properties, resulting in high computational and labor
costs, thereby limiting scalability and performance gains. In this paper, we
propose SFTMix, a novel recipe that elevates instruction-tuning performance
beyond the conventional NTP paradigm, without the need for well-curated
datasets. Observing that LLMs exhibit uneven confidence across the semantic
representation space, we argue that examples with different confidence levels
should play distinct roles during the instruction-tuning process. Based on this
insight, SFTMix leverages training dynamics to identify examples with varying
confidence levels, then applies a Mixup-based regularization to mitigate
overfitting on confident examples while propagating supervision signals to
improve learning on relatively unconfident ones. This approach enables SFTMix
to significantly outperform NTP across a wide range of instruction-following
and healthcare domain-specific SFT tasks, demonstrating its adaptability to
diverse LLM families and scalability to datasets of any size. Comprehensive
ablation studies further verify the robustness of SFTMix's design choices,
underscoring its versatility in consistently enhancing performance across
different LLMs and datasets in broader natural language processing
applications.Summary
AI-Generated Summary