EMLoC：LoRA補正を活用したエミュレータベースのメモリ効率化ファインチューニング

要旨

オープンソースの基盤モデルは急速に採用と開発が進み、多様な領域で強力な汎用能力を実現しています。しかし、大規模な基盤モデルをドメイン固有または個人化されたタスクにファインチューニングすることは、推論時のメモリ使用量を大幅に超えるため、多くのユーザーにとって現実的ではありません。本論文では、EMLoC（Emulator-based Memory-efficient fine-tuning framework with LoRA Correction）を提案します。EMLoCは、推論に必要なメモリ予算内でモデルのファインチューニングを可能にするフレームワークです。EMLoCは、小さな下流キャリブレーションセットに対して活性化を考慮した特異値分解（SVD）を用いて、タスク固有の軽量エミュレータを構築します。その後、LoRAを用いてこの軽量エミュレータ上でファインチューニングを行います。元のモデルと圧縮されたエミュレータの間の不一致に対処するため、ファインチューニングされたLoRAモジュールを補正する新しい補償アルゴリズムを提案し、これにより推論用の元のモデルに統合することが可能になります。EMLoCは柔軟な圧縮率と標準的なトレーニングパイプラインをサポートし、幅広いアプリケーションに適応可能です。大規模な実験により、EMLoCが複数のデータセットとモダリティにおいて他のベースラインを上回ることを実証しました。さらに、量子化を行わずに、EMLoCは38Bモデルのファインチューニングを単一の24GBコンシューマーGPUで実現し、個人ユーザーにとって効率的で実用的なモデル適応を可能にします。

English

Open-source foundation models have seen rapid adoption and development, enabling powerful general-purpose capabilities across diverse domains. However, fine-tuning large foundation models for domain-specific or personalized tasks remains prohibitively expensive for most users due to the significant memory overhead beyond that of inference. We introduce EMLoC, an Emulator-based Memory-efficient fine-tuning framework with LoRA Correction, which enables model fine-tuning within the same memory budget required for inference. EMLoC constructs a task-specific light-weight emulator using activation-aware singular value decomposition (SVD) on a small downstream calibration set. Fine-tuning then is performed on this lightweight emulator via LoRA. To tackle the misalignment between the original model and the compressed emulator, we propose a novel compensation algorithm to correct the fine-tuned LoRA module, which thus can be merged into the original model for inference. EMLoC supports flexible compression ratios and standard training pipelines, making it adaptable to a wide range of applications. Extensive experiments demonstrate that EMLoC outperforms other baselines across multiple datasets and modalities. Moreover, without quantization, EMLoC enables fine-tuning of a 38B model on a single 24GB consumer GPU-bringing efficient and practical model adaptation to individual users.

EMLoC：LoRA補正を活用したエミュレータベースのメモリ効率化ファインチューニング

EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction

要旨

Support