使用具有學習異向縮放的任務向量進行知識組合
Knowledge Composition using Task Vectors with Learned Anisotropic Scaling
July 3, 2024
作者: Frederic Z. Zhang, Paul Albert, Cristian Rodriguez-Opazo, Anton van den Hengel, Ehsan Abbasnejad
cs.AI
摘要
預訓練模型產生強大的通用表示,可透過微調進行適應。相對於預訓練模型的學習權重差異,被稱為任務向量,描述了微調的方向和步幅。任務向量的重要性在於對其進行簡單算術運算,可以將來自不同領域的多樣表示結合在一起。本文基於這些任務向量的特性,旨在回答以下問題:(1) 任務向量的組成部分,特別是參數區塊,是否表現出類似特徵,以及 (2) 這些區塊如何用於增強知識組合和轉移。為此,我們引入了aTLAS,一種算法,它線性組合了具有不同學習係數的參數區塊,從而在任務向量層面實現各向異性縮放。我們展示這樣的線性組合明確利用了預訓練模型的低固有維度,僅有少量係數是可學習的參數。此外,參數區塊的組合利用了已經學習的表示,從而減少對大量數據的依賴。我們展示了我們的方法在任務算術、少樣本識別和測試時間適應中的有效性,具有監督或非監督目標。特別地,我們展示了 (1) 學習的各向異性縮放使得任務向量更具解耦性,組合時干擾較少;(2) 任務向量組合在稀缺或無標記數據時表現出色,並且不太容易受到領域轉移的影響,從而提高了泛化能力;(3) 在訓練之前混合來自不同任務向量的最具信息量的參數區塊可以減少記憶體占用量,並提高知識轉移的靈活性。此外,我們展示了aTLAS作為一種PEFT方法的潛力,特別是在數據較少時,並證明了其可擴展性。
English
Pre-trained models produce strong generic representations that can be adapted
via fine-tuning. The learned weight difference relative to the pre-trained
model, known as a task vector, characterises the direction and stride of
fine-tuning. The significance of task vectors is such that simple arithmetic
operations on them can be used to combine diverse representations from
different domains. This paper builds on these properties of task vectors and
aims to answer (1) whether components of task vectors, particularly parameter
blocks, exhibit similar characteristics, and (2) how such blocks can be used to
enhance knowledge composition and transfer. To this end, we introduce aTLAS, an
algorithm that linearly combines parameter blocks with different learned
coefficients, resulting in anisotropic scaling at the task vector level. We
show that such linear combinations explicitly exploit the low intrinsic
dimensionality of pre-trained models, with only a few coefficients being the
learnable parameters. Furthermore, composition of parameter blocks leverages
the already learned representations, thereby reducing the dependency on large
amounts of data. We demonstrate the effectiveness of our method in task
arithmetic, few-shot recognition and test-time adaptation, with supervised or
unsupervised objectives. In particular, we show that (1) learned anisotropic
scaling allows task vectors to be more disentangled, causing less interference
in composition; (2) task vector composition excels with scarce or no labeled
data and is less prone to domain shift, thus leading to better
generalisability; (3) mixing the most informative parameter blocks across
different task vectors prior to training can reduce the memory footprint and
improve the flexibility of knowledge transfer. Moreover, we show the potential
of aTLAS as a PEFT method, particularly with less data, and demonstrate that
its scalibility.Summary
AI-Generated Summary