MultiLoRA: 为实现更好的多任务学习而使LoRA民主化

摘要

LoRA 在为特定任务调整 LLMs 时实现了显著的资源效率和可比性能。自 ChatGPT 在各种任务上展示出卓越性能以来，人们越来越希望将一个模型适应所有任务。然而，LoRA 的显式低秩限制了在复杂多任务场景中的适应性能。LoRA 受少数顶部奇异向量的主导，而微调则分解为一组较不重要的酉变换。本文提出了 MultiLoRA，通过减少 LoRA 中观察到的顶部奇异向量的主导性，实现更好的多任务适应性。MultiLoRA 水平扩展 LoRA 模块，并改变适应矩阵的参数初始化，以减少参数依赖性，从而产生更平衡的酉子空间。我们首次构建了专门的训练数据，混合了指令跟随、自然语言理解、世界知识等数据集，以覆盖语义和句法上不同的样本。仅增加 2.5% 的额外参数，MultiLoRA 在多个基准和模型规模上均优于单个 LoRA 和微调。对 MultiLoRA 的权重更新矩阵进行进一步研究显示减少了对顶部奇异向量的依赖，并增加了更民主的酉变换贡献。

English

LoRA achieves remarkable resource efficiency and comparable performance when adapting LLMs for specific tasks. Since ChatGPT demonstrated superior performance on various tasks, there has been a growing desire to adapt one model for all tasks. However, the explicit low-rank of LoRA limits the adaptation performance in complex multi-task scenarios. LoRA is dominated by a small number of top singular vectors while fine-tuning decomposes into a set of less important unitary transforms. In this paper, we propose MultiLoRA for better multi-task adaptation by reducing the dominance of top singular vectors observed in LoRA. MultiLoRA scales LoRA modules horizontally and change parameter initialization of adaptation matrices to reduce parameter dependency, thus yields more balanced unitary subspaces. We unprecedentedly construct specialized training data by mixing datasets of instruction follow, natural language understanding, world knowledge, to cover semantically and syntactically different samples. With only 2.5% of additional parameters, MultiLoRA outperforms single LoRA counterparts and fine-tuning on multiple benchmarks and model scales. Further investigation into weight update matrices of MultiLoRA exhibits reduced dependency on top singular vectors and more democratic unitary transform contributions.

MultiLoRA: 为实现更好的多任务学习而使LoRA民主化

MultiLoRA: Democratizing LoRA for Better Multi-Task Learning

摘要

Support