学習された異方性スケーリングを用いたタスクベクトルによる知識合成

要旨

事前学習済みモデルは、ファインチューニングによって適応可能な強力な汎用表現を生成します。事前学習済みモデルに対する学習済み重みの差分は、タスクベクトルとして知られ、ファインチューニングの方向性と歩幅を特徴づけます。タスクベクトルの重要性は、それらに対する単純な算術演算を用いることで、異なるドメインからの多様な表現を組み合わせることができる点にあります。本論文は、タスクベクトルのこれらの特性を基盤とし、(1) タスクベクトルの構成要素、特にパラメータブロックが同様の特性を示すかどうか、および (2) そのようなブロックをどのように知識の構成と転移を強化するために利用できるか、という問いに答えることを目指します。この目的のために、我々はaTLASを導入します。これは、異なる学習済み係数でパラメータブロックを線形結合し、タスクベクトルレベルでの異方性スケーリングを実現するアルゴリズムです。このような線形結合は、事前学習済みモデルの低い内在次元性を明示的に活用し、学習可能なパラメータはわずか数個であることを示します。さらに、パラメータブロックの構成は、既に学習された表現を活用することで、大量のデータへの依存を軽減します。我々は、タスク算術、少数ショット認識、テスト時適応において、教師ありまたは教師なしの目的で、本手法の有効性を実証します。特に、(1) 学習された異方性スケーリングにより、タスクベクトルがより分離され、構成時の干渉が少なくなること、(2) タスクベクトルの構成が、ラベル付きデータが少ないか全くない場合でも優れており、ドメインシフトに陥りにくく、汎化性能が向上すること、(3) 異なるタスクベクトル間で最も情報量の多いパラメータブロックを事前に混合することで、メモリフットプリントを削減し、知識転移の柔軟性を向上できることを示します。さらに、aTLASがPEFT手法として、特にデータが少ない場合に有効である可能性を示し、そのスケーラビリティを実証します。

English

Pre-trained models produce strong generic representations that can be adapted via fine-tuning. The learned weight difference relative to the pre-trained model, known as a task vector, characterises the direction and stride of fine-tuning. The significance of task vectors is such that simple arithmetic operations on them can be used to combine diverse representations from different domains. This paper builds on these properties of task vectors and aims to answer (1) whether components of task vectors, particularly parameter blocks, exhibit similar characteristics, and (2) how such blocks can be used to enhance knowledge composition and transfer. To this end, we introduce aTLAS, an algorithm that linearly combines parameter blocks with different learned coefficients, resulting in anisotropic scaling at the task vector level. We show that such linear combinations explicitly exploit the low intrinsic dimensionality of pre-trained models, with only a few coefficients being the learnable parameters. Furthermore, composition of parameter blocks leverages the already learned representations, thereby reducing the dependency on large amounts of data. We demonstrate the effectiveness of our method in task arithmetic, few-shot recognition and test-time adaptation, with supervised or unsupervised objectives. In particular, we show that (1) learned anisotropic scaling allows task vectors to be more disentangled, causing less interference in composition; (2) task vector composition excels with scarce or no labeled data and is less prone to domain shift, thus leading to better generalisability; (3) mixing the most informative parameter blocks across different task vectors prior to training can reduce the memory footprint and improve the flexibility of knowledge transfer. Moreover, we show the potential of aTLAS as a PEFT method, particularly with less data, and demonstrate that its scalibility.

学習された異方性スケーリングを用いたタスクベクトルによる知識合成

Knowledge Composition using Task Vectors with Learned Anisotropic Scaling

要旨

Support