ChatPaper.aiChatPaper

TCNCA:具有分塊注意力的時間卷積網絡,用於可擴展的序列處理

TCNCA: Temporal Convolution Network with Chunked Attention for Scalable Sequence Processing

December 9, 2023
作者: Aleksandar Terzic, Michael Hersche, Geethan Karunaratne, Luca Benini, Abu Sebastian, Abbas Rahimi
cs.AI

摘要

MEGA是一種最近基於Transformer的架構,它利用線性遞歸運算子,其並行計算基於FFT,隨著序列長度L的增加,計算複雜度為O(LlogL)。我們通過將線性遞歸替換為一種特殊的時間卷積網絡,使其可以實現更大的感受野大小並使用更淺的網絡,將計算複雜度降至O(L)。得到的模型稱為TCNCA,即帶有分塊注意力的時間卷積網絡。我們在EnWik8語言建模、長距離競技(LRA)序列分類以及合成推理基準聯想回憶上評估了TCNCA。在EnWik8上,TCNCA表現優於MEGA,在訓練過程中前向/後向傳遞速度比MEGA快1.37倍/1.24倍,並達到更低的損失。TCNCA中使用的膨脹卷積在GPU中始終比基於FFT的並行遞歸更快,使其成為處理非常大序列長度的可擴展候選方案:在長達131k的序列上,它們的前向/後向傳遞速度最多快7.07倍/2.86倍。在LRA方面,TCNCA實現了平均1.28倍的推理加速,並且與MEGA實現的準確性相似。在聯想回憶中,我們發現,即使是TCNCA的簡化版本,沒有過多的乘法和加法交互作用,仍然在一系列序列長度和詞彙大小上優於或與MEGA競爭。
English
MEGA is a recent transformer-based architecture, which utilizes a linear recurrent operator whose parallel computation, based on the FFT, scales as O(LlogL), with L being the sequence length. We build upon their approach by replacing the linear recurrence with a special temporal convolutional network which permits larger receptive field size with shallower networks, and reduces the computational complexity to O(L). The resulting model is called TCNCA, a Temporal Convolutional Network with Chunked Attention. We evaluate TCNCA on EnWik8 language modeling, long-range-arena (LRA) sequence classification, as well as a synthetic reasoning benchmark associative recall. On EnWik8, TCNCA outperforms MEGA, reaching a lower loss with 1.37times/1.24times faster forward/backward pass during training. The dilated convolutions used in TCNCA are consistently and significantly faster operations than the FFT-based parallelized recurrence in GPUs, making them a scalable candidate for handling very large sequence lengths: they are up to 7.07times/2.86times faster in the forward/backward pass for sequences up to 131k. Further on LRA, TCNCA achieves, on average, 1.28times speed-up during inference with similar accuracy to what MEGA achieves. On associative recall, we find that even a simplified version of TCNCA, without excessive multiplicative and additive interactions, remains superior or competitive to MEGA on a range of sequence lengths and vocabulary sizes.
PDF30December 15, 2024