ChatPaper.aiChatPaper

Q-Sparse:所有大型语言模型都可以完全稀疏激活。

Q-Sparse: All Large Language Models can be Fully Sparsely-Activated

July 15, 2024
作者: Hongyu Wang, Shuming Ma, Ruiping Wang, Furu Wei
cs.AI

摘要

我们介绍了 Q-Sparse,一种简单而有效的方法,用于训练稀疏激活的大型语言模型(LLMs)。Q-Sparse 可以实现LLMs中激活的完全稀疏,这可以在推理中带来显著的效率提升。这是通过对激活应用前K稀疏化和应用直通估计器进行训练来实现的。这项工作的关键结果有:(1)Q-Sparse 可以在推理时间效率大大提高的同时实现与基准LLMs相当的结果;(2)我们提出了适用于稀疏激活的LLMs的推理最优缩放定律;(3)Q-Sparse 在不同设置下都很有效,包括从头开始训练、继续训练现成的LLMs以及微调;(4)Q-Sparse 适用于全精度和1位LLMs(例如,BitNet b1.58)。特别是,BitNet b1.58 和 Q-Sparse 的协同作用(可以配备MoE)为未来LLMs的效率革新提供了基石和明确路径,包括成本和能耗。
English
We introduce, Q-Sparse, a simple yet effective approach to training sparsely-activated large language models (LLMs). Q-Sparse enables full sparsity of activations in LLMs which can bring significant efficiency gains in inference. This is achieved by applying top-K sparsification to the activations and the straight-through-estimator to the training. The key results from this work are, (1) Q-Sparse can achieve results comparable to those of baseline LLMs while being much more efficient at inference time; (2) We present an inference-optimal scaling law for sparsely-activated LLMs; (3) Q-Sparse is effective in different settings, including training-from-scratch, continue-training of off-the-shelf LLMs, and finetuning; (4) Q-Sparse works for both full-precision and 1-bit LLMs (e.g., BitNet b1.58). Particularly, the synergy of BitNet b1.58 and Q-Sparse (can be equipped with MoE) provides the cornerstone and a clear path to revolutionize the efficiency, including cost and energy consumption, of future LLMs.

Summary

AI-Generated Summary

PDF233November 28, 2024