ShadowPEFT: 파라미터 효율적 미세 조정을 위한 섀도우 네트워크

초록

매개변수 효율적 미세 조정(PEFT)은 사전 학습된 백본을 고정한 채 소수의 작업 특화 매개변수만 훈련함으로써 대규모 언어 모델(LLM)의 전체 매개변수 미세 조정 비용을 줄입니다. 그러나 LoRA(Low-Rank Adaptation)와 같은 기존 접근법은 개별 가중치에 독립적인 저순위 섭동을 직접 삽입하여 적응을 달성함으로써 지역적 매개변수화를 초래합니다. 본 논문은 계층 수준 정제를 깊이 공유 섀도우 모듈을 통해 수행하는 중앙집중식 PEFT 프레임워크인 ShadowPEFT를 제안합니다. ShadowPEFT는 각 트랜스포머 계층에서 병렬 섀도우 상태를 유지하며 이를 반복적으로 발전시켜 점점 더 풍부한 은닉 상태를 생성합니다. 이 설계는 적응 방식을 분산된 가중치 공간 섭동에서 공유 계층 공간 정제 과정으로 전환합니다. 섀도우 모듈은 백본과 분리되어 깊이에 걸쳐 재사용될 수 있으며, 독립적으로 사전 학습이 가능하고, 필요시 분리 모드로 배치될 수 있어 에지 컴퓨팅 시나리오에 유리합니다. 생성 및 이해 벤치마크 실험 결과, ShadowPEFT는 유사한 학습 가능 매개변수 예산 하에서 LoRA 및 DoRA와 성능이 동등하거나 더 우수함을 보였습니다. 섀도우 사전 학습, 교차 데이터셋 전이, 매개변수 스케일링, 추론 지연 시간 및 시스템 수준 평가에 대한 추가 분석은 중앙집중식 계층 공간 적응이 기존 저순위 PEFT에 대한 경쟁력 있고 유연한 대안임을 시사합니다.

English

Parameter-efficient fine-tuning (PEFT) reduces the training cost of full-parameter fine-tuning for large language models (LLMs) by training only a small set of task-specific parameters while freezing the pretrained backbone. However, existing approaches, such as Low-Rank Adaptation (LoRA), achieve adaptation by inserting independent low-rank perturbations directly to individual weights, resulting in a local parameterization of adaptation. We propose ShadowPEFT, a centralized PEFT framework that instead performs layer-level refinement through a depth-shared shadow module. At each transformer layer, ShadowPEFT maintains a parallel shadow state and evolves it repeatedly for progressively richer hidden states. This design shifts adaptation from distributed weight-space perturbations to a shared layer-space refinement process. Since the shadow module is decoupled from the backbone, it can be reused across depth, independently pretrained, and optionally deployed in a detached mode, benefiting edge computing scenarios. Experiments on generation and understanding benchmarks show that ShadowPEFT matches or outperforms LoRA and DoRA under comparable trainable-parameter budgets. Additional analyses on shadow pretraining, cross-dataset transfer, parameter scaling, inference latency, and system-level evaluation suggest that centralized layer-space adaptation is a competitive and flexible alternative to conventional low-rank PEFT.

ShadowPEFT: 파라미터 효율적 미세 조정을 위한 섀도우 네트워크

ShadowPEFT: Shadow Network for Parameter-Efficient Fine-Tuning

초록

Support