MultiLoRA: 为实现更好的多任务学习而使LoRA民主化
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning
November 20, 2023
作者: Yiming Wang, Yu Lin, Xiaodong Zeng, Guannan Zhang
cs.AI
摘要
LoRA 在为特定任务调整 LLMs 时实现了显著的资源效率和可比性能。自 ChatGPT 在各种任务上展示出卓越性能以来,人们越来越希望将一个模型适应所有任务。然而,LoRA 的显式低秩限制了在复杂多任务场景中的适应性能。LoRA 受少数顶部奇异向量的主导,而微调则分解为一组较不重要的酉变换。本文提出了 MultiLoRA,通过减少 LoRA 中观察到的顶部奇异向量的主导性,实现更好的多任务适应性。MultiLoRA 水平扩展 LoRA 模块,并改变适应矩阵的参数初始化,以减少参数依赖性,从而产生更平衡的酉子空间。我们首次构建了专门的训练数据,混合了指令跟随、自然语言理解、世界知识等数据集,以覆盖语义和句法上不同的样本。仅增加 2.5% 的额外参数,MultiLoRA 在多个基准和模型规模上均优于单个 LoRA 和微调。对 MultiLoRA 的权重更新矩阵进行进一步研究显示减少了对顶部奇异向量的依赖,并增加了更民主的酉变换贡献。
English
LoRA achieves remarkable resource efficiency and comparable performance when
adapting LLMs for specific tasks. Since ChatGPT demonstrated superior
performance on various tasks, there has been a growing desire to adapt one
model for all tasks. However, the explicit low-rank of LoRA limits the
adaptation performance in complex multi-task scenarios. LoRA is dominated by a
small number of top singular vectors while fine-tuning decomposes into a set of
less important unitary transforms. In this paper, we propose MultiLoRA for
better multi-task adaptation by reducing the dominance of top singular vectors
observed in LoRA. MultiLoRA scales LoRA modules horizontally and change
parameter initialization of adaptation matrices to reduce parameter dependency,
thus yields more balanced unitary subspaces. We unprecedentedly construct
specialized training data by mixing datasets of instruction follow, natural
language understanding, world knowledge, to cover semantically and
syntactically different samples. With only 2.5% of additional parameters,
MultiLoRA outperforms single LoRA counterparts and fine-tuning on multiple
benchmarks and model scales. Further investigation into weight update matrices
of MultiLoRA exhibits reduced dependency on top singular vectors and more
democratic unitary transform contributions.