ChatPaper.aiChatPaper

深度混合:在基於Transformer的語言模型中動態分配計算资源

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

April 2, 2024
作者: David Raposo, Sam Ritter, Blake Richards, Timothy Lillicrap, Peter Conway Humphreys, Adam Santoro
cs.AI

摘要

基於Transformer的語言模型會均勻地分配FLOPs到輸入序列中。在這項研究中,我們展示了Transformers可以學習動態地將FLOPs(或計算)分配給序列中的特定位置,優化模型深度中不同層的序列上的分配。我們的方法通過限制可以參與自注意力和MLP計算的令牌數(k)來強制執行總計算預算。要處理的令牌是由網絡使用top-k路由機制確定的。由於k是事先定義的,這個簡單的程序使用具有已知張量大小的靜態計算圖,不像其他有條件的計算技術。然而,由於k令牌的身份是流動的,這種方法可以在時間和模型深度維度上非均勻地消耗FLOPs。因此,計算支出在總和上是完全可預測的,但在令牌級別上是動態且具有上下文敏感性的。這種訓練方式的模型不僅學會動態分配計算,而且效率高。這些模型在相等的FLOPS和牆鐘時間訓練時能夠達到基準性能,但每次前向傳遞所需的FLOPs只有一小部分,並且在後訓練抽樣期間的步驟速度可能快50%。
English
Transformer-based language models spread FLOPs uniformly across input sequences. In this work we demonstrate that transformers can instead learn to dynamically allocate FLOPs (or compute) to specific positions in a sequence, optimising the allocation along the sequence for different layers across the model depth. Our method enforces a total compute budget by capping the number of tokens (k) that can participate in the self-attention and MLP computations at a given layer. The tokens to be processed are determined by the network using a top-k routing mechanism. Since k is defined a priori, this simple procedure uses a static computation graph with known tensor sizes, unlike other conditional computation techniques. Nevertheless, since the identities of the k tokens are fluid, this method can expend FLOPs non-uniformly across the time and model depth dimensions. Thus, compute expenditure is entirely predictable in sum total, but dynamic and context-sensitive at the token-level. Not only do models trained in this way learn to dynamically allocate compute, they do so efficiently. These models match baseline performance for equivalent FLOPS and wall-clock times to train, but require a fraction of the FLOPs per forward pass, and can be upwards of 50\% faster to step during post-training sampling.

Summary

AI-Generated Summary

PDF1067November 26, 2024