EMLoC:基于模拟器的内存高效微调与LoRA校正
EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction
June 13, 2025
作者: Hsi-Che Lin, Yu-Chu Yu, Kai-Po Chang, Yu-Chiang Frank Wang
cs.AI
摘要
开源基础模型已迅速获得广泛采用与发展,为跨领域提供了强大的通用能力。然而,针对特定领域或个性化任务对大型基础模型进行微调,由于远超推理所需的内存开销,对大多数用户而言仍成本过高。我们提出了EMLoC,一种基于模拟器的内存高效微调框架,结合LoRA校正技术,使得模型微调能在与推理相同的内存预算内完成。EMLoC通过在小规模下游校准集上采用激活感知的奇异值分解(SVD)构建任务特定的轻量级模拟器。随后,通过LoRA在此轻量级模拟器上进行微调。为解决原始模型与压缩模拟器之间的偏差,我们提出了一种新颖的补偿算法,用于校正微调后的LoRA模块,使其能够无缝融入原始模型进行推理。EMLoC支持灵活的压缩比和标准训练流程,使其能适应广泛的应用场景。大量实验表明,EMLoC在多个数据集和模态上均优于其他基线方法。更为显著的是,无需量化处理,EMLoC便能在单块24GB消费级GPU上实现38B模型的微调,为个体用户带来了高效且实用的模型适配方案。
English
Open-source foundation models have seen rapid adoption and development,
enabling powerful general-purpose capabilities across diverse domains. However,
fine-tuning large foundation models for domain-specific or personalized tasks
remains prohibitively expensive for most users due to the significant memory
overhead beyond that of inference. We introduce EMLoC, an Emulator-based
Memory-efficient fine-tuning framework with LoRA Correction, which enables
model fine-tuning within the same memory budget required for inference. EMLoC
constructs a task-specific light-weight emulator using activation-aware
singular value decomposition (SVD) on a small downstream calibration set.
Fine-tuning then is performed on this lightweight emulator via LoRA. To tackle
the misalignment between the original model and the compressed emulator, we
propose a novel compensation algorithm to correct the fine-tuned LoRA module,
which thus can be merged into the original model for inference. EMLoC supports
flexible compression ratios and standard training pipelines, making it
adaptable to a wide range of applications. Extensive experiments demonstrate
that EMLoC outperforms other baselines across multiple datasets and modalities.
Moreover, without quantization, EMLoC enables fine-tuning of a 38B model on a
single 24GB consumer GPU-bringing efficient and practical model adaptation to
individual users.