通過蝴蝶分解實現參數高效的正交微調
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization
November 10, 2023
作者: Weiyang Liu, Zeju Qiu, Yao Feng, Yuliang Xiu, Yuxuan Xue, Longhui Yu, Haiwen Feng, Zhen Liu, Juyeon Heo, Songyou Peng, Yandong Wen, Michael J. Black, Adrian Weller, Bernhard Schölkopf
cs.AI
摘要
大型基礎模型正變得普遍,但從頭開始訓練它們成本過高。因此,將這些強大模型有效地適應下游任務變得日益重要。本文研究了一個有原則的微調範式--正交微調(OFT)--用於下游任務適應。儘管展示出良好的泛化能力,OFT 仍然使用了相當多的可訓練參數,這是由於正交矩陣的高維度所致。為了解決這個問題,我們首先從信息傳輸的角度檢視 OFT,然後確定一些關鍵的期望,以實現更好的參數效率。受 Cooley-Tukey 快速傅立葉變換算法如何實現高效信息傳輸的啟發,我們提出了一種使用蝴蝶結構的高效正交參數化方法。我們將此參數化方法應用於 OFT,創建了一種新的參數效率微調方法,稱為正交蝴蝶(BOFT)。通過將 OFT 納入特例,BOFT 引入了一個泛化的正交微調框架。最後,我們對大型視覺轉換器、大型語言模型和文本到圖像擴散模型進行了廣泛的實證研究,以適應視覺和語言中的各種下游任務。
English
Large foundation models are becoming ubiquitous, but training them from
scratch is prohibitively expensive. Thus, efficiently adapting these powerful
models to downstream tasks is increasingly important. In this paper, we study a
principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream
task adaptation. Despite demonstrating good generalizability, OFT still uses a
fairly large number of trainable parameters due to the high dimensionality of
orthogonal matrices. To address this, we start by examining OFT from an
information transmission perspective, and then identify a few key desiderata
that enable better parameter-efficiency. Inspired by how the Cooley-Tukey fast
Fourier transform algorithm enables efficient information transmission, we
propose an efficient orthogonal parameterization using butterfly structures. We
apply this parameterization to OFT, creating a novel parameter-efficient
finetuning method, called Orthogonal Butterfly (BOFT). By subsuming OFT as a
special case, BOFT introduces a generalized orthogonal finetuning framework.
Finally, we conduct an extensive empirical study of adapting large vision
transformers, large language models, and text-to-image diffusion models to
various downstream tasks in vision and language.