EE-LLM:具有3D平行性的早退出大型語言模型的大規模訓練和推理
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
December 8, 2023
作者: Yanxi Chen, Xuchen Pan, Yaliang Li, Bolin Ding, Jingren Zhou
cs.AI
摘要
我們提出了EE-LLM,一個用於大規模訓練和推斷早期退出大型語言模型(LLMs)的框架。儘管最近的研究已經顯示了早期退出在加速LLM推斷方面的有效性的初步證據,但EE-LLM通過支持使用大規模3D平行處理來訓練和推斷早期退出LLMs,邁出了擴展早期退出LLMs的基礎性一步。建立在Megatron-LM之上,EE-LLM實現了各種針對早期退出進行的算法創新和性能優化,包括一種輕量級方法,利用管道平行處理來促進早期退出訓練目標的反向傳播,以及利用原始管道計劃中的閒置資源進行與早期退出層相關的計算的技術,以及兩種與KV快取相容的早期退出推斷方法,用於自回歸生成。我們的分析和實證研究表明,與標準LLM訓練相比,EE-LLM實現了出色的訓練效率,並具有可以忽略的計算開銷,同時在不影響輸出質量的情況下實現了優越的推斷加速。為了促進進一步的研究和應用,我們在https://github.com/pan-x-c/EE-LLM 上發布了EE-LLM。
English
We present EE-LLM, a framework for large-scale training and inference of
early-exit large language models (LLMs). While recent works have shown
preliminary evidence for the efficacy of early exiting in accelerating LLM
inference, EE-LLM makes a foundational step towards scaling up early-exit LLMs
by supporting their training and inference with massive 3D parallelism. Built
upon Megatron-LM, EE-LLM implements a variety of algorithmic innovations and
performance optimizations tailored to early exiting, including a lightweight
method that facilitates backpropagation for the early-exit training objective
with pipeline parallelism, techniques of leveraging idle resources in the
original pipeline schedule for computation related to early-exit layers, and
two approaches of early-exit inference that are compatible with KV caching for
autoregressive generation. Our analytical and empirical study shows that EE-LLM
achieves great training efficiency with negligible computational overhead
compared to standard LLM training, as well as outstanding inference speedup
without compromising output quality. To facilitate further research and
adoption, we release EE-LLM at https://github.com/pan-x-c/EE-LLM.