EE-Tuning：调整大型语言模型早期退出的经济且可扩展的解决方案

摘要

本文介绍了EE-Tuning，这是一种轻量且经济的解决方案，用于训练/调整早期退出的大型语言模型（LLMs）。与完全参数预训练的常见方法相比，EE-Tuning通过在参数高效的方式下调整任何预训练（可能是微调过的）标准LLM，并增加额外的早期退出层，从而需要较少的计算资源和训练数据。我们对EE-Tuning的实现通过广泛的性能优化实现了出色的训练效率，并且由于与3D并行性的完全兼容性，具有良好的可扩展性。系统化实验的结果验证了EE-Tuning的有效性，证实了在有限的训练预算下可以实现有效的早期退出LLM推断。为了让社区能够使用早期退出LLMs，我们在https://github.com/pan-x-c/EE-LLM发布了EE-Tuning的实现源代码。

English

This work introduces EE-Tuning, a lightweight and economical solution to training/tuning early-exit large language models (LLMs). In contrast to the common approach of full-parameter pre-training, EE-Tuning augments any pre-trained (and possibly fine-tuned) standard LLM with additional early-exit layers that are tuned in a parameter-efficient manner, which requires significantly less computational resources and training data. Our implementation of EE-Tuning achieves outstanding training efficiency via extensive performance optimizations, as well as scalability due to its full compatibility with 3D parallelism. Results of systematic experiments validate the efficacy of EE-Tuning, confirming that effective early-exit LLM inference can be achieved with a limited training budget. In hope of making early-exit LLMs accessible to the community, we release the source code of our implementation of EE-Tuning at https://github.com/pan-x-c/EE-LLM.

EE-Tuning：调整大型语言模型早期退出的经济且可扩展的解决方案

EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit Large Language Models

摘要

Support