TRAMS: 장거리 언어 모델링을 위한 학습 없이 메모리 선택

초록

트랜스포머(Transformer) 아키텍처는 수많은 AI 모델에서 핵심적인 역할을 하지만, 여전히 장거리 언어 모델링에서의 문제점을 안고 있습니다. 장거리 의존성 문제를 해결하기 위해 여러 특수한 트랜스포머 아키텍처가 설계되었지만, Transformer-XL과 같은 기존 방법들은 비효율적인 메모리 사용 비율이 높다는 한계를 지니고 있습니다. 본 연구에서는 단순한 메트릭을 기반으로 어텐션 계산에 참여할 토큰을 선택하는 플러그 앤 플레이 전략인 TRAining-free Memory Selection(TRAMS)을 제안합니다. 이 전략은 현재 쿼리와 높은 어텐션 점수를 가질 가능성이 있는 토큰은 유지하고, 나머지는 무시할 수 있게 합니다. 우리는 이 접근법을 단어 수준 벤치마크(WikiText-103)와 문자 수준 벤치마크(enwik8)에서 테스트했으며, 추가적인 학습이나 파라미터 증가 없이도 성능 향상을 확인했습니다.

English

The Transformer architecture is crucial for numerous AI models, but it still faces challenges in long-range language modeling. Though several specific transformer architectures have been designed to tackle issues of long-range dependencies, existing methods like Transformer-XL are plagued by a high percentage of ineffective memories. In this study, we present a plug-and-play strategy, known as TRAining-free Memory Selection (TRAMS), that selects tokens participating in attention calculation based on one simple metric. This strategy allows us to keep tokens that are likely to have a high attention score with the current queries and ignore the other ones. We have tested our approach on the word-level benchmark (WikiText-103) and the character-level benchmark (enwik8), and the results indicate an improvement without having additional training or adding additional parameters.

TRAMS: 장거리 언어 모델링을 위한 학습 없이 메모리 선택

TRAMS: Training-free Memory Selection for Long-range Language Modeling

초록

Support