다르게 더 많은 레이어 쌓기: 저랭크 업데이트를 통한 고랭크 학습

초록

스케일링의 우위와 효과성으로 인해 수백억 개의 파라미터를 가진 대규모 네트워크가 등장했음에도 불구하고, 과매개화된 모델을 훈련시켜야 하는 필요성은 여전히 잘 이해되지 않고 있으며, 대안적인 접근 방식이 반드시 고성능 모델을 더 저렴하게 훈련시키는 것은 아닙니다. 본 논문에서는 대규모 신경망을 훈련시키는 대안적인 접근법으로 저랭크 훈련 기법을 탐구합니다. 우리는 고랭크 네트워크를 훈련시키기 위해 저랭크 업데이트를 활용하는 ReLoRA라는 새로운 방법을 소개합니다. 우리는 ReLoRA를 최대 3억 5천만 개의 파라미터를 가진 트랜스포머 언어 모델의 사전 훈련에 적용하고, 일반적인 신경망 훈련과 비슷한 성능을 보임을 입증합니다. 더 나아가, ReLoRA의 효율성이 모델 크기가 커질수록 증가함을 관찰하여, 이 기법이 수십억 개의 파라미터를 가진 네트워크를 효율적으로 훈련시키는 유망한 접근법임을 보여줍니다. 우리의 연구 결과는 저랭크 훈련 기법의 잠재력과 스케일링 법칙에 대한 함의를 밝혀냅니다.

English

Despite the dominance and effectiveness of scaling, resulting in large networks with hundreds of billions of parameters, the necessity to train overparametrized models remains poorly understood, and alternative approaches do not necessarily make it cheaper to train high-performance models. In this paper, we explore low-rank training techniques as an alternative approach to training large neural networks. We introduce a novel method called ReLoRA, which utilizes low-rank updates to train high-rank networks. We apply ReLoRA to pre-training transformer language models with up to 350M parameters and demonstrate comparable performance to regular neural network training. Furthermore, we observe that the efficiency of ReLoRA increases with model size, making it a promising approach for training multi-billion-parameter networks efficiently. Our findings shed light on the potential of low-rank training techniques and their implications for scaling laws.

다르게 더 많은 레이어 쌓기: 저랭크 업데이트를 통한 고랭크 학습

Stack More Layers Differently: High-Rank Training Through Low-Rank Updates

초록

Support