后训练大规模模型中的Delta参数编辑的统一视图
A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models
October 17, 2024
作者: Qiaoyu Tang, Le Yu, Bowen Yu, Hongyu Lin, Keming Lu, Yaojie Lu, Xianpei Han, Le Sun
cs.AI
摘要
后训练已经成为调整大规模预训练模型以适应各种任务的关键范式,其效果完全由增量参数(即后训练和预训练参数之间的差异)所反映。虽然许多研究通过剪枝、量化、低秩逼近和外推等操作探讨了增量参数的特性,但缺乏一个系统地检查这些特性的统一框架。在本文中,我们提出了一种基于损失函数的黎曼和逼近的新视角,以阐明增量参数编辑操作。我们的分析将现有方法根据其后编辑性能分为三类:竞争性、降低性和改进性,解释它们如何由黎曼和逼近项表达以及如何改变模型性能。对包括ViT、LLaMA 3、Qwen 2和Mistral在内的视觉和语言模型进行了大量实验,证实了我们的理论发现。此外,我们介绍了对现有技术如DARE和BitDelta的扩展,突出它们在利用增量参数特性和重新组织成通用表达式以增强后训练模型中增量参数编辑的适用性和有效性方面的局限性。
English
Post-training has emerged as a crucial paradigm for adapting large-scale
pre-trained models to various tasks, whose effects are fully reflected by delta
parameters (i.e., the disparity between post-trained and pre-trained
parameters). While numerous studies have explored delta parameter properties
via operations like pruning, quantization, low-rank approximation, and
extrapolation, a unified framework for systematically examining these
characteristics has been lacking. In this paper, we propose a novel perspective
based on Riemann sum approximation of the loss function to elucidate delta
parameter editing operations. Our analysis categorizes existing methods into
three classes based on their post-editing performance: competitive, decreased,
and improved, explaining how they are expressed by the Riemann sum
approximation term and how they alter the model performance. Extensive
experiments on both visual and language models, including ViT, LLaMA 3, Qwen 2,
and Mistral, corroborate our theoretical findings. Furthermore, we introduce
extensions to existing techniques like DARE and BitDelta, highlighting their
limitations in leveraging the properties of delta parameters and reorganizing
them into general expressions to enhance the applicability and effectiveness of
delta parameter editing in post-trained models.Summary
AI-Generated Summary