MultiLoRA：讓多任務學習更加民主化

摘要

LoRA 在將 LLMs 適應特定任務時實現了卓越的資源效率和可比性能。由於 ChatGPT 在各種任務上展現出優越性能，人們越來越希望將一個模型適應所有任務。然而，LoRA 的明確低秩限制了在複雜多任務場景中的適應性能。LoRA 被少數頂級奇異向量所主導，而微調則分解為一組不太重要的單位轉換。在本文中，我們提出了 MultiLoRA，通過減少 LoRA 中觀察到的頂級奇異向量的主導地位，以實現更好的多任務適應。MultiLoRA 通過水平擴展 LoRA 模塊並改變適應矩陣的參數初始化，以減少參數依賴性，從而產生更平衡的單位子空間。我們首次構建了專門的訓練數據，混合了指令跟隨、自然語言理解、世界知識的數據集，以涵蓋在語義和句法上不同的樣本。僅通過額外 2.5% 的參數，MultiLoRA 在多個基準和模型規模上均優於單個 LoRA 和微調。對 MultiLoRA 的權重更新矩陣進行進一步研究表明，其對頂級奇異向量的依賴性減少，並且單位轉換貢獻更加平等。

English

LoRA achieves remarkable resource efficiency and comparable performance when adapting LLMs for specific tasks. Since ChatGPT demonstrated superior performance on various tasks, there has been a growing desire to adapt one model for all tasks. However, the explicit low-rank of LoRA limits the adaptation performance in complex multi-task scenarios. LoRA is dominated by a small number of top singular vectors while fine-tuning decomposes into a set of less important unitary transforms. In this paper, we propose MultiLoRA for better multi-task adaptation by reducing the dominance of top singular vectors observed in LoRA. MultiLoRA scales LoRA modules horizontally and change parameter initialization of adaptation matrices to reduce parameter dependency, thus yields more balanced unitary subspaces. We unprecedentedly construct specialized training data by mixing datasets of instruction follow, natural language understanding, world knowledge, to cover semantically and syntactically different samples. With only 2.5% of additional parameters, MultiLoRA outperforms single LoRA counterparts and fine-tuning on multiple benchmarks and model scales. Further investigation into weight update matrices of MultiLoRA exhibits reduced dependency on top singular vectors and more democratic unitary transform contributions.

MultiLoRA：讓多任務學習更加民主化

MultiLoRA: Democratizing LoRA for Better Multi-Task Learning

摘要

Support