당신의 문맥은 배열이 아닙니다: 트랜스포머에서의 랜덤 액세스 제한 해제

초록

최근의 성공에도 불구하고, Transformer 기반의 대규모 언어 모델은 놀라운 실패 모드를 보여줍니다. 이러한 실패 모드의 잘 알려진 예는 훈련 중에 본 적이 있는 것보다 긴 추론 시간의 문제 인스턴스를 해결하지 못하는 것입니다. 본 연구에서는 간단한 패리티 작업에서 모델 행동을 상세히 분석함으로써 이 실패의 근본 원인을 더 탐구합니다. 우리의 분석은 길이 일반화 실패가 모델이 컨텍스트 창 내에서 무작위 메모리 액세스를 수행할 수 없는 능력과 복잡하게 관련되어 있다는 것을 시사합니다. 우리는 인덱싱이 필요 없는 방법이나 콘텐츠 기반 주소 지정을 통해 간접적으로 무작위 토큰 액세스를 가능하게 하는 방법론의 효과를 증명함으로써 이 가설을 지원하는 증거를 제시합니다. 또한 어디에서 어떻게 무작위 메모리 액세스를 수행하지 못하는 실패가 어텐션 맵 시각화를 통해 나타나는지 자세히 보여줍니다.

English

Despite their recent successes, Transformer-based large language models show surprising failure modes. A well-known example of such failure modes is their inability to length-generalize: solving problem instances at inference time that are longer than those seen during training. In this work, we further explore the root cause of this failure by performing a detailed analysis of model behaviors on the simple parity task. Our analysis suggests that length generalization failures are intricately related to a model's inability to perform random memory accesses within its context window. We present supporting evidence for this hypothesis by demonstrating the effectiveness of methodologies that circumvent the need for indexing or that enable random token access indirectly, through content-based addressing. We further show where and how the failure to perform random memory access manifests through attention map visualizations.

당신의 문맥은 배열이 아닙니다: 트랜스포머에서의 랜덤 액세스 제한 해제

Your Context Is Not an Array: Unveiling Random Access Limitations in Transformers

초록

Support