LM-Infinite: 대규모 언어 모델을 위한 간단한 실시간 길이 일반화 기법

초록

최근 몇 년간 Transformer 기반 대형 언어 모델(LLM)의 성능은 다양한 분야에서 눈부신 발전을 이루어 왔다. 이러한 LLM들이 점점 더 복잡한 작업에 배포되면서, 더 긴 추론 과정을 수행하거나 더 큰 맥락을 이해해야 할 필요성이 자주 발생한다. 이러한 상황에서, LLM의 긴 시퀀스에 대한 길이 일반화 실패가 더욱 두드러지게 나타난다. 대부분의 사전 학습 방식은 학습 시퀀스를 고정된 길이(예: LLaMa의 경우 2048)로 잘라내며, LLM은 상대적 위치 인코딩과 같은 문제를 해결하기 위해 설계된 방법을 사용하더라도 더 긴 맥락 이후에 유창한 텍스트를 생성하는 데 어려움을 겪는다. 더 긴 코퍼스에 대한 미세 조정과 같은 일반적인 해결책은 막대한 하드웨어 및 시간 비용을 수반하며, 신중한 학습 과정 설계가 필요하다. 기존 LLM의 생성 능력을 더 효율적으로 활용하기 위해, 우리는 이 문제에 기여하는 주요 분포 외(OOD) 요인을 이론적 및 실증적으로 조사한다. 이러한 진단에서 영감을 받아, 우리는 실시간 길이 일반화를 위한 간단하지만 효과적인 해결책인 LM-Infinite를 제안한다. 이 방법은 Lambda 형태의 주의 마스크와 거리 제한만을 포함하며, 매개변수 업데이트나 학습이 필요하지 않다. 우리는 이 방법이 상대적 위치 인코딩 방법을 사용하는 다양한 LLM에 적용 가능함을 발견했다. LM-Infinite는 O(n) 시간 및 공간 복잡도로 계산 효율적이며, ArXiv 및 OpenWebText2 데이터셋에서 최대 32k 토큰까지 일관된 유창성과 생성 품질을 보여주며, 2.72배의 디코딩 속도 향상을 달성했다. 패스키 검색과 같은 하위 작업에서도, 이 방법은 학습 길이보다 훨씬 긴 입력에서도 계속 작동하며, 일반 모델이 즉시 실패하는 상황에서도 효과적으로 작동한다.

English

In recent years, there have been remarkable advancements in the performance of Transformer-based Large Language Models (LLMs) across various domains. As these LLMs are deployed for increasingly complex tasks, they often face the needs to conduct longer reasoning processes or understanding larger contexts. In these situations, the length generalization failure of LLMs on long sequences become more prominent. Most pre-training schemes truncate training sequences to a fixed length (such as 2048 for LLaMa). LLMs often struggle to generate fluent texts, let alone carry out downstream tasks, after longer contexts, even with relative positional encoding which is designed to cope with this problem. Common solutions such as finetuning on longer corpora often involves daunting hardware and time costs and requires careful training process design. To more efficiently leverage the generation capacity of existing LLMs, we theoretically and empirically investigate the main out-of-distribution (OOD) factors contributing to this problem. Inspired by this diagnosis, we propose a simple yet effective solution for on-the-fly length generalization, LM-Infinite, which involves only a Lambda-shaped attention mask and a distance limit while requiring no parameter updates or learning. We find it applicable to a variety of LLMs using relative-position encoding methods. LM-Infinite is computational efficient with O(n) time and space, and demonstrates consistent fluency and generation quality to as long as 32k tokens on ArXiv and OpenWebText2 datasets, with 2.72x decoding speedup. On downstream task such as passkey retrieval, it continues to work on inputs much longer than training lengths where vanilla models fail immediately.

LM-Infinite: 대규모 언어 모델을 위한 간단한 실시간 길이 일반화 기법

LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models

초록

Support