蝴蝶因子分解实现参数高效的正交微调

摘要

大型基础模型已变得无处不在，但从头开始训练它们成本过高。因此，将这些强大模型高效地调整到下游任务变得日益重要。本文研究了一种原则性的微调范式——正交微调（OFT）——用于下游任务适应。尽管展示了良好的泛化能力，但由于正交矩阵的高维度，OFT仍然使用了相当多的可训练参数。为了解决这个问题，我们首先从信息传输的角度审视OFT，然后确定了几个能够实现更好参数效率的关键愿望。受库利-图基快速傅里叶变换算法实现高效信息传输的启发，我们提出了一种使用蝴蝶结构的高效正交参数化方法。我们将这种参数化应用于OFT，创造了一种新颖的参数高效微调方法，称为正交蝴蝶（BOFT）。通过将OFT纳入为一种特例，BOFT引入了一个广义的正交微调框架。最后，我们进行了广泛的实证研究，将大型视觉变换器、大型语言模型和文本到图像扩散模型调整到视觉和语言的各种下游任务中。

English

Large foundation models are becoming ubiquitous, but training them from scratch is prohibitively expensive. Thus, efficiently adapting these powerful models to downstream tasks is increasingly important. In this paper, we study a principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly large number of trainable parameters due to the high dimensionality of orthogonal matrices. To address this, we start by examining OFT from an information transmission perspective, and then identify a few key desiderata that enable better parameter-efficiency. Inspired by how the Cooley-Tukey fast Fourier transform algorithm enables efficient information transmission, we propose an efficient orthogonal parameterization using butterfly structures. We apply this parameterization to OFT, creating a novel parameter-efficient finetuning method, called Orthogonal Butterfly (BOFT). By subsuming OFT as a special case, BOFT introduces a generalized orthogonal finetuning framework. Finally, we conduct an extensive empirical study of adapting large vision transformers, large language models, and text-to-image diffusion models to various downstream tasks in vision and language.

蝴蝶因子分解实现参数高效的正交微调

Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization

摘要

Support