您的上下文不是一个数组:揭示变压器中的随机访问限制
Your Context Is Not an Array: Unveiling Random Access Limitations in Transformers
August 10, 2024
作者: MohammadReza Ebrahimi, Sunny Panchal, Roland Memisevic
cs.AI
摘要
尽管基于Transformer的大型语言模型取得了最近的成功,但它们展现出了一些令人惊讶的失败模式。其中一个众所周知的例子是它们无法进行长度泛化:在推理时解决比训练时见过的更长的问题实例。在这项工作中,我们通过对简单的奇偶任务中模型行为进行详细分析,进一步探讨这种失败的根本原因。我们的分析表明,长度泛化失败与模型无法在其上下文窗口内执行随机内存访问密切相关。我们通过展示规避索引需求或通过基于内容的寻址间接启用随机标记访问的方法的有效性,为这一假设提供了支持证据。我们进一步展示了模型无法执行随机内存访问的失败是如何通过注意力图可视化呈现出来的。
English
Despite their recent successes, Transformer-based large language models show
surprising failure modes. A well-known example of such failure modes is their
inability to length-generalize: solving problem instances at inference time
that are longer than those seen during training. In this work, we further
explore the root cause of this failure by performing a detailed analysis of
model behaviors on the simple parity task. Our analysis suggests that length
generalization failures are intricately related to a model's inability to
perform random memory accesses within its context window. We present supporting
evidence for this hypothesis by demonstrating the effectiveness of
methodologies that circumvent the need for indexing or that enable random token
access indirectly, through content-based addressing. We further show where and
how the failure to perform random memory access manifests through attention map
visualizations.Summary
AI-Generated Summary