TRAMS:用于长距离语言建模的无训练记忆选择
TRAMS: Training-free Memory Selection for Long-range Language Modeling
October 24, 2023
作者: Haofei Yu, Cunxiang wang, Yue Zhang, Wei Bi
cs.AI
摘要
Transformer架构对许多人工智能模型至关重要,但在长距离语言建模方面仍面临挑战。尽管已经设计了几种特定的Transformer架构来解决长距离依赖性问题,但现有方法如Transformer-XL存在大量无效记忆的问题。在本研究中,我们提出了一种即插即用的策略,称为无需训练的记忆选择(TRAMS),根据一个简单的度量选择参与注意力计算的标记。这种策略使我们能够保留那些可能与当前查询具有高注意力得分的标记,并忽略其他标记。我们在单词级基准(WikiText-103)和字符级基准(enwik8)上测试了我们的方法,结果表明在没有额外训练或添加额外参数的情况下取得了改进。
English
The Transformer architecture is crucial for numerous AI models, but it still
faces challenges in long-range language modeling. Though several specific
transformer architectures have been designed to tackle issues of long-range
dependencies, existing methods like Transformer-XL are plagued by a high
percentage of ineffective memories. In this study, we present a plug-and-play
strategy, known as TRAining-free Memory Selection (TRAMS), that selects tokens
participating in attention calculation based on one simple metric. This
strategy allows us to keep tokens that are likely to have a high attention
score with the current queries and ignore the other ones. We have tested our
approach on the word-level benchmark (WikiText-103) and the character-level
benchmark (enwik8), and the results indicate an improvement without having
additional training or adding additional parameters.