限られたリソースでの大規模言語モデルの全パラメータファインチューニング

要旨

大規模言語モデル（LLMs）は自然言語処理（NLP）に革命をもたらしましたが、その訓練には膨大なGPUリソースが必要です。LLMsの訓練のハードルを下げることは、研究者のより広範な参加を促し、学界と社会の両方に利益をもたらすでしょう。既存のアプローチは、少数のパラメータを調整または追加するパラメータ効率的なファインチューニングに焦点を当ててきましたが、限られたリソースでLLMsの全パラメータを調整する課題に取り組んだものはほとんどありません。本研究では、メモリ使用量を削減するために、勾配計算とパラメータ更新を1ステップに融合した新しいオプティマイザ、LOw-Memory Optimization（LOMO）を提案します。LOMOを既存のメモリ節約技術と統合することで、標準的なアプローチ（DeepSpeedソリューション）と比較してメモリ使用量を10.8%に削減しました。その結果、8台のRTX 3090（各24GBメモリ）を搭載した単一マシンで65Bモデルの全パラメータファインチューニングが可能になりました。

English

Large Language Models (LLMs) have revolutionized Natural Language Processing (NLP) but demand massive GPU resources for training. Lowering the threshold for LLMs training would encourage greater participation from researchers, benefiting both academia and society. While existing approaches have focused on parameter-efficient fine-tuning, which tunes or adds a small number of parameters, few have addressed the challenge of tuning the full parameters of LLMs with limited resources. In this work, we propose a new optimizer, LOw-Memory Optimization (LOMO), which fuses the gradient computation and the parameter update in one step to reduce memory usage. By integrating LOMO with existing memory saving techniques, we reduce memory usage to 10.8% compared to the standard approach (DeepSpeed solution). Consequently, our approach enables the full parameter fine-tuning of a 65B model on a single machine with 8 RTX 3090, each with 24GB memory.

限られたリソースでの大規模言語モデルの全パラメータファインチューニング

Full Parameter Fine-tuning for Large Language Models with Limited Resources

要旨

Support