RecycleGPT: 재활용 가능한 모듈을 갖춘 자기회귀 언어 모델

초록

기존의 대규모 언어 모델은 K개의 토큰 시퀀스를 생성하기 위해 K번 실행되어야 합니다. 본 논문에서는 전체 모델을 여러 단계에 걸쳐 실행하지 않고 사전 생성된 모델 상태를 재활용함으로써 빠른 디코딩 속도를 달성한 생성 언어 모델인 RecycleGPT를 소개합니다. 우리의 접근 방식은 시퀀스 내 인접한 토큰들이 일반적으로 강한 상관관계를 가지며, 시퀀스의 다음 토큰은 앞선 토큰들을 기반으로 합리적으로 추측하거나 유추할 수 있다는 관찰에 기반합니다. 다운스트림 텍스트 생성 작업에 대한 이론적 평가와 실제 테스트를 통해, 우리는 이 접근 방식이 추론 지연 시간을 줄이고 높은 성능을 유지하면서 최대 1.4배의 속도 향상을 달성하는 데 효과적임을 입증합니다.

English

Existing large language models have to run K times to generate a sequence of K tokens. In this paper, we present RecycleGPT, a generative language model with fast decoding speed by recycling pre-generated model states without running the whole model in multiple steps. Our approach relies on the observation that adjacent tokens in a sequence usually have strong correlations and the next token in a sequence can be reasonably guessed or inferred based on the preceding ones. Through theoretical evaluations and practical tests on downstream text generation tasks, we demonstrate the effectiveness of our approach in lowering inference latency, achieving up to 1.4x speedup while preserving high performance.

RecycleGPT: 재활용 가능한 모듈을 갖춘 자기회귀 언어 모델

RecycleGPT: An Autoregressive Language Model with Recyclable Module

초록

Support