학습 가능한 커널 함수를 가진 선형 트랜스포머가 더 나은 인-컨텍스트 모델이다

초록

언어 모델(LM)을 위한 서브쿼드라틱 아키텍처의 경계를 확장하는 것은 빠르게 진화하는 자연어 처리 분야에서 매우 중요합니다. 최근의 혁신 중 하나인 상태 공간 모델(State Space Model)은 초기에 언어 모델링 작업에서 트랜스포머의 성능을 능가하는 것으로 주목받았습니다. 그러나 이러한 모델들은 트랜스포머가 전통적으로 뛰어난 영역인 인-컨텍스트 학습(In-Context Learning) 능력에서 결함을 드러냈습니다. 이에 따라 Based 모델이 등장하여, 선형 트랜스포머(Linear Transformer)와 지수 함수의 테일러 급수에서 영감을 받은 커널을 결합하고, 여기에 컨볼루션 네트워크를 추가한 하이브리드 솔루션을 제시했습니다. 이 모델은 트랜스포머의 인-컨텍스트 학습 능력을 닮아 해당 분야에서 강력한 경쟁자로 자리 잡았습니다. 본 연구에서는 Based 모델의 커널에 단순하면서도 우아한 수정을 가해, Multi-Query Associative Recall 작업과 Pile 데이터셋에서의 전반적인 언어 모델링 프로세스를 통해 평가된 인-컨텍스트 학습 능력을 향상시킨 결과를 제시합니다.

English

Advancing the frontier of subquadratic architectures for Language Models (LMs) is crucial in the rapidly evolving field of natural language processing. Current innovations, including State Space Models, were initially celebrated for surpassing Transformer performance on language modeling tasks. However, these models have revealed deficiencies in essential In-Context Learning capabilities - a domain where the Transformer traditionally shines. The Based model emerged as a hybrid solution, blending a Linear Transformer with a kernel inspired by the Taylor expansion of exponential functions, augmented by convolutional networks. Mirroring the Transformer's in-context adeptness, it became a strong contender in the field. In our work, we present a singular, elegant alteration to the Based kernel that amplifies its In-Context Learning abilities evaluated with the Multi-Query Associative Recall task and overall language modeling process, as demonstrated on the Pile dataset.

학습 가능한 커널 함수를 가진 선형 트랜스포머가 더 나은 인-컨텍스트 모델이다

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

초록

Support