주의 병목 현상 극복

초록

어텐션 기반 트랜스포머는 장거리 의존성을 모델링하고 가변 길이 입력 시퀀스를 처리할 수 있는 능력으로 인해 많은 딥러닝 분야에서 표준 아키텍처로 자리 잡았습니다. 그러나 이차 복잡도를 가진 어텐션 메커니즘은 트랜스포머 아키텍처의 주요 병목 현상으로 작용합니다. 이 알고리즘은 디코더에서 단방향으로만 작동하며, 과매개변화된 디코더 전용 모델에서는 정적 패턴으로 수렴하는 경향이 있습니다. 저는 이러한 문제를 해결하기 위해 어텐션 또는 활성화 대체물로 생성 함수를 개발했습니다. 이 함수는 각 토큰을 이전 토큰과 비교함으로써 여전히 자기회귀적 특성을 유지합니다. 나노GPT(nanoGPT)를 사용한 테스트 환경에서 이 접근법은 더 작은 모델로 더 낮은 손실을 달성했습니다. 또한 평균 컨텍스트 벡터를 통합함으로써 손실이 더욱 감소했습니다. 이 어텐션 대체 개념은 GNU AGPL v3 라이선스 하에 https://gitlab.com/Bachstelze/causal_generation에서 배포되고 있습니다.

English

Attention-based transformers have become the standard architecture in many deep learning fields, primarily due to their ability to model long-range dependencies and handle variable-length input sequences. However, the attention mechanism with its quadratic complexity is a significant bottleneck in the transformer architecture. This algorithm is only uni-directional in the decoder and converges to a static pattern in over-parametrized decoder-only models. I address this issue by developing a generative function as attention or activation replacement. It still has the auto-regressive character by comparing each token with the previous one. In my test setting with nanoGPT this yields a smaller loss while having a smaller model. The loss further drops by incorporating an average context vector. This concept of attention replacement is distributed under the GNU AGPL v3 license at https://gitlab.com/Bachstelze/causal_generation.

주의 병목 현상 극복

Breaking the Attention Bottleneck

초록

Summary

Support

Support