LoRAShear: 효율적인 대규모 언어 모델 구조적 가지치기 및 지식 복구

초록

대형 언어 모델(LLMs)은 인공지능 분야의 지형을 바꾸어 놓았지만, 그 거대한 규모는 계산 비용 측면에서 상당한 도전 과제를 제시합니다. 우리는 LLM을 구조적으로 가지치기하고 지식을 복구하는 새로운 효율적 접근법인 LoRAShear를 소개합니다. 일반적인 LLM이 주어졌을 때, LoRAShear는 먼저 의존성 그래프를 생성하여 최소한의 제거 구조를 발견하고 지식 분포를 분석합니다. 그런 다음 LoRA 어댑터에 대해 점진적인 구조적 가지치기를 진행하며, 중복 구조에 있는 정보를 더 잘 보존하기 위해 내재적 지식 전달을 가능하게 합니다. 가지치기 과정에서 손실된 지식을 복구하기 위해, LoRAShear는 동적 데이터 어댑터를 활용한 동적 미세 조정 방식을 세심히 연구하고 제안하여, 전체 모델과의 성능 격차를 효과적으로 줄입니다. 수치적 결과는 단일 GPU를 사용하여 며칠 만에 LoRAShear가 LLM의 규모를 20% 줄이면서도 성능 저하를 1.0%로 억제하며, 최신 기술을 크게 능가함을 보여줍니다. 소스 코드는 https://github.com/microsoft/lorashear에서 제공될 예정입니다.

English

Large Language Models (LLMs) have transformed the landscape of artificial intelligence, while their enormous size presents significant challenges in terms of computational costs. We introduce LoRAShear, a novel efficient approach to structurally prune LLMs and recover knowledge. Given general LLMs, LoRAShear first creates the dependency graphs to discover minimally removal structures and analyze the knowledge distribution. It then proceeds progressive structured pruning on LoRA adaptors and enables inherent knowledge transfer to better preserve the information in the redundant structures. To recover the lost knowledge during pruning, LoRAShear meticulously studies and proposes a dynamic fine-tuning schemes with dynamic data adaptors to effectively narrow down the performance gap to the full models. Numerical results demonstrate that by only using one GPU within a couple of GPU days, LoRAShear effectively reduced footprint of LLMs by 20% with only 1.0% performance degradation and significantly outperforms state-of-the-arts. The source code will be available at https://github.com/microsoft/lorashear.

LoRAShear: 효율적인 대규모 언어 모델 구조적 가지치기 및 지식 복구

LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery

초록

Support