에이비

초록

계산 및 메모리 예산이 제한적인 산업용 NLP의 핵심은 여전히 컴팩트한 사전 학습 양방향 인코더가 차지하고 있다. 그 효과는 BERT 스타일 아키텍처에서 널리 채택된 것처럼, 셀프 어텐션이 시퀀스 수준 병렬 처리로 고품질의 양방향 문맥화를 제공하는 능력에서 비롯된다. 최근에는 자동 회귀적이며 어텐션을 사용하지 않는 대안으로 Avey가 소개되었으며, 이는 자연스럽게 인코더 전용 적응을 허용한다. 본 논문에서는 Avey를 인코더 전용 패러다임에 맞게 재구성하고, 분리된 정적 및 동적 매개변수화, 안정성 중심 정규화, 신경망 압축 등 여러 아키텍처 개선을 제안한다. 실험 결과, 이렇게 재구성된 아키텍처는 널리 사용되는 4개의 Transformer 기반 인코더와 비교하여 표준 토큰 분류 및 정보 검색 벤치마크에서 지속적으로 우수한 성능을 보였으며, 긴 문맥으로 확장 시 더 효율적인 규모 확장성을 나타냈다.

English

Compact pretrained bidirectional encoders remain the backbone of industrial NLP under tight compute and memory budgets. Their effectiveness stems from self-attention's ability to deliver high-quality bidirectional contextualization with sequence-level parallelism, as popularized by BERT-style architectures. Recently, Avey was introduced as an autoregressive, attention-free alternative that naturally admits an encoder-only adaptation. In this paper, we reformulate Avey for the encoder-only paradigm and propose several innovations to its architecture, including decoupled static and dynamic parameterizations, stability-oriented normalization, and neural compression. Results show that this reformulated architecture compares favorably to four widely used Transformer-based encoders, consistently outperforming them on standard token-classification and information-retrieval benchmarks while scaling more efficiently to long contexts.

Avey-B

초록

Support