您的上下文不是一个数组：揭示变压器中的随机访问限制

摘要

尽管基于Transformer的大型语言模型取得了最近的成功，但它们展现出了一些令人惊讶的失败模式。其中一个众所周知的例子是它们无法进行长度泛化：在推理时解决比训练时见过的更长的问题实例。在这项工作中，我们通过对简单的奇偶任务中模型行为进行详细分析，进一步探讨这种失败的根本原因。我们的分析表明，长度泛化失败与模型无法在其上下文窗口内执行随机内存访问密切相关。我们通过展示规避索引需求或通过基于内容的寻址间接启用随机标记访问的方法的有效性，为这一假设提供了支持证据。我们进一步展示了模型无法执行随机内存访问的失败是如何通过注意力图可视化呈现出来的。

English

Despite their recent successes, Transformer-based large language models show surprising failure modes. A well-known example of such failure modes is their inability to length-generalize: solving problem instances at inference time that are longer than those seen during training. In this work, we further explore the root cause of this failure by performing a detailed analysis of model behaviors on the simple parity task. Our analysis suggests that length generalization failures are intricately related to a model's inability to perform random memory accesses within its context window. We present supporting evidence for this hypothesis by demonstrating the effectiveness of methodologies that circumvent the need for indexing or that enable random token access indirectly, through content-based addressing. We further show where and how the failure to perform random memory access manifests through attention map visualizations.

您的上下文不是一个数组：揭示变压器中的随机访问限制

Your Context Is Not an Array: Unveiling Random Access Limitations in Transformers

摘要

Support