TRAMS:無需訓練的長距離語言建模記憶選擇
TRAMS: Training-free Memory Selection for Long-range Language Modeling
October 24, 2023
作者: Haofei Yu, Cunxiang wang, Yue Zhang, Wei Bi
cs.AI
摘要
Transformer架構對許多人工智慧模型至關重要,但在長距離語言建模方面仍面臨挑戰。雖然已經設計了幾種特定的Transformer架構來應對長距離相依性的問題,但現有方法如Transformer-XL存在著高比例的無效記憶問題。在本研究中,我們提出了一種即插即用的策略,稱為無需訓練的記憶選擇(TRAMS),根據一個簡單的指標選擇參與注意力計算的標記。這種策略使我們能夠保留那些可能與當前查詢具有高注意力分數的標記,並忽略其他標記。我們在單詞級基準(WikiText-103)和字符級基準(enwik8)上測試了我們的方法,結果表明在不進行額外訓練或添加額外參數的情況下實現了改進。
English
The Transformer architecture is crucial for numerous AI models, but it still
faces challenges in long-range language modeling. Though several specific
transformer architectures have been designed to tackle issues of long-range
dependencies, existing methods like Transformer-XL are plagued by a high
percentage of ineffective memories. In this study, we present a plug-and-play
strategy, known as TRAining-free Memory Selection (TRAMS), that selects tokens
participating in attention calculation based on one simple metric. This
strategy allows us to keep tokens that are likely to have a high attention
score with the current queries and ignore the other ones. We have tested our
approach on the word-level benchmark (WikiText-103) and the character-level
benchmark (enwik8), and the results indicate an improvement without having
additional training or adding additional parameters.