YaRN: 대규모 언어 모델의 효율적인 컨텍스트 윈도우 확장

초록

회전 위치 임베딩(RoPE)은 트랜스포머 기반 언어 모델에서 위치 정보를 효과적으로 인코딩하는 것으로 입증되었습니다. 그러나 이러한 모델들은 학습된 시퀀스 길이를 넘어서는 일반화에 실패합니다. 본 논문에서는 YaRN(Yet another RoPE extensioN method)을 제안합니다. 이는 기존 방법보다 10배 적은 토큰과 2.5배 적은 학습 단계로 이러한 모델들의 컨텍스트 윈도우를 확장하는 계산 효율적인 방법입니다. YaRN을 사용하여 LLaMA 모델이 원래 사전 학습에서 허용되는 것보다 훨씬 긴 컨텍스트 길이를 효과적으로 활용하고 외삽할 수 있으며, 컨텍스트 윈도우 확장에서 이전의 최신 기술을 능가함을 보여줍니다. 또한, YaRN이 파인튜닝 데이터셋의 제한된 컨텍스트를 넘어서는 외삽 능력을 보여줌을 입증합니다. 64k 및 128k 컨텍스트 윈도우로 YaRN을 사용하여 파인튜닝된 Llama 2 7B/13B의 체크포인트를 https://github.com/jquesnelle/yarn에서 공개합니다.

English

Rotary Position Embeddings (RoPE) have been shown to effectively encode positional information in transformer-based language models. However, these models fail to generalize past the sequence length they were trained on. We present YaRN (Yet another RoPE extensioN method), a compute-efficient method to extend the context window of such models, requiring 10x less tokens and 2.5x less training steps than previous methods. Using YaRN, we show that LLaMA models can effectively utilize and extrapolate to context lengths much longer than their original pre-training would allow, while also surpassing previous the state-of-the-art at context window extension. In addition, we demonstrate that YaRN exhibits the capability to extrapolate beyond the limited context of a fine-tuning dataset. We publish the checkpoints of Llama 2 7B/13B fine-tuned using YaRN with 64k and 128k context windows at https://github.com/jquesnelle/yarn

YaRN: 대규모 언어 모델의 효율적인 컨텍스트 윈도우 확장

YaRN: Efficient Context Window Extension of Large Language Models

초록

Support