LoRA Land:310個可媲美GPT-4的精調大型語言模型技術報告
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
April 29, 2024
作者: Justin Zhao, Timothy Wang, Wael Abid, Geoffrey Angus, Arnav Garg, Jeffery Kinnison, Alex Sherstinsky, Piero Molino, Travis Addair, Devvret Rishi
cs.AI
摘要
低秩自適應(LoRA)已成為大型語言模型(LLM)參數高效微調(PEFT)中最廣泛採用的方法之一。LoRA在實現與全參數微調相當性能的同時,有效減少了可訓練參數量和記憶體佔用。本研究旨在評估基於LoRA微調的LLM在實際應用中訓練與部署的可行性。首先,我們針對10個基礎模型和31項任務(共310個模型)量測了量化低秩適配器的微調品質,發現4位元LoRA微調模型相較基礎模型平均提升34個百分點,較GPT-4平均領先10個百分點。其次,我們探討最適合微調的基礎模型類型,並評估任務複雜度啟發式方法在預測微調結果時的相關性與預測能力。最後,我們測試了LoRAX的延遲與並行處理能力——這款開源多LoRA推理伺服器透過共享基礎模型權重與動態適配器加載技術,可在單張GPU上部署多個LoRA微調模型。目前驅動LoRA Land網頁應用的LoRAX,僅憑單張80GB記憶體的NVIDIA A100 GPU即同時托管25個基於Mistral-7B的LoRA微調模型,充分展現了使用多個專業化LLM相較單一通用LLM在品質與成本效益上的雙重優勢。
English
Low Rank Adaptation (LoRA) has emerged as one of the most widely adopted
methods for Parameter Efficient Fine-Tuning (PEFT) of Large Language Models
(LLMs). LoRA reduces the number of trainable parameters and memory usage while
achieving comparable performance to full fine-tuning. We aim to assess the
viability of training and serving LLMs fine-tuned with LoRA in real-world
applications. First, we measure the quality of LLMs fine-tuned with quantized
low rank adapters across 10 base models and 31 tasks for a total of 310 models.
We find that 4-bit LoRA fine-tuned models outperform base models by 34 points
and GPT-4 by 10 points on average. Second, we investigate the most effective
base models for fine-tuning and assess the correlative and predictive
capacities of task complexity heuristics in forecasting the outcomes of
fine-tuning. Finally, we evaluate the latency and concurrency capabilities of
LoRAX, an open-source Multi-LoRA inference server that facilitates the
deployment of multiple LoRA fine-tuned models on a single GPU using shared base
model weights and dynamic adapter loading. LoRAX powers LoRA Land, a web
application that hosts 25 LoRA fine-tuned Mistral-7B LLMs on a single NVIDIA
A100 GPU with 80GB memory. LoRA Land highlights the quality and
cost-effectiveness of employing multiple specialized LLMs over a single,
general-purpose LLM.