ChatPaper.aiChatPaper

食腐鬣狗:将Transformer精炼为长卷积模型

Scavenging Hyena: Distilling Transformers into Long Convolution Models

January 31, 2024
作者: Tokiniaina Raharison Ralambomihanta, Shahrad Mohammadzadeh, Mohammad Sami Nur Islam, Wassim Jabbour, Laurence Liang
cs.AI

摘要

大型语言模型(LLMs)的快速演进,以GPT-4等架构为代表,已经重塑了自然语言处理的格局。本文介绍了一种开创性的方法来解决与LLM预训练相关的效率问题,提出利用知识蒸馏进行跨架构转移。借鉴高效的鬣狗机制的见解,我们的方法通过将变压器模型中的注意力头替换为鬣狗,提供了一种经济高效的替代方案,同时应对处理长上下文信息的挑战,这是二次注意力机制固有的。与传统的压缩方法不同,我们的技术不仅提高了推理速度,而且在准确性和效率方面均超越了预训练。在不断演进的LLM时代,我们的工作有助于追求可持续的人工智能解决方案,平衡计算能力和环境影响之间的关系。
English
The rapid evolution of Large Language Models (LLMs), epitomized by architectures like GPT-4, has reshaped the landscape of natural language processing. This paper introduces a pioneering approach to address the efficiency concerns associated with LLM pre-training, proposing the use of knowledge distillation for cross-architecture transfer. Leveraging insights from the efficient Hyena mechanism, our method replaces attention heads in transformer models by Hyena, offering a cost-effective alternative to traditional pre-training while confronting the challenge of processing long contextual information, inherent in quadratic attention mechanisms. Unlike conventional compression-focused methods, our technique not only enhances inference speed but also surpasses pre-training in terms of both accuracy and efficiency. In the era of evolving LLMs, our work contributes to the pursuit of sustainable AI solutions, striking a balance between computational power and environmental impact.
PDF171December 15, 2024