您的上下文不是一個陣列：揭示變形金剛中的隨機訪問限制

摘要

儘管基於Transformer的大型語言模型最近取得了成功，但卻展現出一些令人驚訝的失敗模式。其中一個眾所周知的失敗模式是它們無法進行長度泛化：在推論時解決比訓練時見過的更長問題實例。在這項研究中，我們通過對簡單的奇偶任務進行詳細分析，進一步探索這種失敗的根本原因。我們的分析表明，長度泛化失敗與模型無法在其上下文窗口內執行隨機記憶訪問密切相關。我們通過展示規避索引需求或通過基於內容的地址訪問間接實現隨機標記訪問的方法的有效性，提出了對這一假設的支持證據。我們進一步展示了無法執行隨機記憶訪問導致的失敗如何在注意力映射可視化中顯示出來。

English

Despite their recent successes, Transformer-based large language models show surprising failure modes. A well-known example of such failure modes is their inability to length-generalize: solving problem instances at inference time that are longer than those seen during training. In this work, we further explore the root cause of this failure by performing a detailed analysis of model behaviors on the simple parity task. Our analysis suggests that length generalization failures are intricately related to a model's inability to perform random memory accesses within its context window. We present supporting evidence for this hypothesis by demonstrating the effectiveness of methodologies that circumvent the need for indexing or that enable random token access indirectly, through content-based addressing. We further show where and how the failure to perform random memory access manifests through attention map visualizations.

您的上下文不是一個陣列：揭示變形金剛中的隨機訪問限制

Your Context Is Not an Array: Unveiling Random Access Limitations in Transformers

摘要

Support