EMLoC: LoRA 보정을 통한 에뮬레이터 기반 메모리 효율적 미세 조정

초록

오픈소스 기반 모델은 다양한 분야에서 강력한 범용 능력을 제공하며 빠르게 채택되고 발전해 왔다. 그러나 대규모 기반 모델을 도메인 특화 또는 개인화된 작업에 맞게 미세 조정하는 것은 추론에 필요한 메모리 이상의 상당한 오버헤드로 인해 대부분의 사용자에게 비용이 너무 많이 든다. 본 연구에서는 추론에 필요한 메모리 예산 내에서 모델 미세 조정을 가능하게 하는 EMLoC(Emulator-based Memory-efficient fine-tuning framework with LoRA Correction)를 소개한다. EMLoC는 작은 다운스트림 캘리브레이션 세트에서 활성화 인식 특이값 분해(SVD)를 사용하여 작업 특화 경량 에뮬레이터를 구축한다. 그런 다음 이 경량 에뮬레이터에서 LoRA를 통해 미세 조정을 수행한다. 원본 모델과 압축된 에뮬레이터 간의 불일치를 해결하기 위해, 본 연구에서는 미세 조정된 LoRA 모듈을 보정하는 새로운 보상 알고리즘을 제안하여 이를 원본 모델에 병합하여 추론에 사용할 수 있게 한다. EMLoC는 유연한 압축 비율과 표준 학습 파이프라인을 지원하므로 다양한 응용 분야에 적응 가능하다. 광범위한 실험을 통해 EMLoC가 여러 데이터셋과 모달리티에서 다른 베이스라인을 능가함을 입증하였다. 또한, 양자화 없이도 EMLoC는 단일 24GB 소비자 GPU에서 38B 모델의 미세 조정을 가능하게 하여 개별 사용자에게 효율적이고 실용적인 모델 적응을 제공한다.

English

Open-source foundation models have seen rapid adoption and development, enabling powerful general-purpose capabilities across diverse domains. However, fine-tuning large foundation models for domain-specific or personalized tasks remains prohibitively expensive for most users due to the significant memory overhead beyond that of inference. We introduce EMLoC, an Emulator-based Memory-efficient fine-tuning framework with LoRA Correction, which enables model fine-tuning within the same memory budget required for inference. EMLoC constructs a task-specific light-weight emulator using activation-aware singular value decomposition (SVD) on a small downstream calibration set. Fine-tuning then is performed on this lightweight emulator via LoRA. To tackle the misalignment between the original model and the compressed emulator, we propose a novel compensation algorithm to correct the fine-tuned LoRA module, which thus can be merged into the original model for inference. EMLoC supports flexible compression ratios and standard training pipelines, making it adaptable to a wide range of applications. Extensive experiments demonstrate that EMLoC outperforms other baselines across multiple datasets and modalities. Moreover, without quantization, EMLoC enables fine-tuning of a 38B model on a single 24GB consumer GPU-bringing efficient and practical model adaptation to individual users.

EMLoC: LoRA 보정을 통한 에뮬레이터 기반 메모리 효율적 미세 조정

EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction

초록

Support