通過蝴蝶分解實現參數高效的正交微調

摘要

大型基礎模型正變得普遍，但從頭開始訓練它們成本過高。因此，將這些強大模型有效地適應下游任務變得日益重要。本文研究了一個有原則的微調範式--正交微調（OFT）--用於下游任務適應。儘管展示出良好的泛化能力，OFT 仍然使用了相當多的可訓練參數，這是由於正交矩陣的高維度所致。為了解決這個問題，我們首先從信息傳輸的角度檢視 OFT，然後確定一些關鍵的期望，以實現更好的參數效率。受 Cooley-Tukey 快速傅立葉變換算法如何實現高效信息傳輸的啟發，我們提出了一種使用蝴蝶結構的高效正交參數化方法。我們將此參數化方法應用於 OFT，創建了一種新的參數效率微調方法，稱為正交蝴蝶（BOFT）。通過將 OFT 納入特例，BOFT 引入了一個泛化的正交微調框架。最後，我們對大型視覺轉換器、大型語言模型和文本到圖像擴散模型進行了廣泛的實證研究，以適應視覺和語言中的各種下游任務。

English

Large foundation models are becoming ubiquitous, but training them from scratch is prohibitively expensive. Thus, efficiently adapting these powerful models to downstream tasks is increasingly important. In this paper, we study a principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly large number of trainable parameters due to the high dimensionality of orthogonal matrices. To address this, we start by examining OFT from an information transmission perspective, and then identify a few key desiderata that enable better parameter-efficiency. Inspired by how the Cooley-Tukey fast Fourier transform algorithm enables efficient information transmission, we propose an efficient orthogonal parameterization using butterfly structures. We apply this parameterization to OFT, creating a novel parameter-efficient finetuning method, called Orthogonal Butterfly (BOFT). By subsuming OFT as a special case, BOFT introduces a generalized orthogonal finetuning framework. Finally, we conduct an extensive empirical study of adapting large vision transformers, large language models, and text-to-image diffusion models to various downstream tasks in vision and language.

通過蝴蝶分解實現參數高效的正交微調

Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization

摘要

Support