ChatPaper.aiChatPaper

食腐鬣狗:將Transformer精煉為長卷積模型

Scavenging Hyena: Distilling Transformers into Long Convolution Models

January 31, 2024
作者: Tokiniaina Raharison Ralambomihanta, Shahrad Mohammadzadeh, Mohammad Sami Nur Islam, Wassim Jabbour, Laurence Liang
cs.AI

摘要

大型語言模型(LLMs)的快速演進,以GPT-4等架構為代表,已經重新塑造了自然語言處理的格局。本文介紹了一種開創性方法,以應對與LLM預訓練相關的效率問題,提議使用知識蒸餾進行跨架構轉移。借鑒高效的Hyena機制的見解,我們的方法通過將轉換器模型中的注意力頭替換為Hyena,提供了一種成本效益高的替代方案,同時應對了處理長篇上下文信息的挑戰,這是二次注意機制固有的。與傳統的壓縮專注方法不同,我們的技術不僅提升了推理速度,還在準確性和效率方面超越了預訓練。在不斷演進的LLMs時代,我們的工作有助於追求可持續的人工智能解決方案,取得了計算能力與環境影響之間的平衡。
English
The rapid evolution of Large Language Models (LLMs), epitomized by architectures like GPT-4, has reshaped the landscape of natural language processing. This paper introduces a pioneering approach to address the efficiency concerns associated with LLM pre-training, proposing the use of knowledge distillation for cross-architecture transfer. Leveraging insights from the efficient Hyena mechanism, our method replaces attention heads in transformer models by Hyena, offering a cost-effective alternative to traditional pre-training while confronting the challenge of processing long contextual information, inherent in quadratic attention mechanisms. Unlike conventional compression-focused methods, our technique not only enhances inference speed but also surpasses pre-training in terms of both accuracy and efficiency. In the era of evolving LLMs, our work contributes to the pursuit of sustainable AI solutions, striking a balance between computational power and environmental impact.
PDF171December 15, 2024