エイビー

要旨

計算資源とメモリ制約が厳しい産業界の自然言語処理において、コンパクトな事前学習双方向エンコーダーは依然として基盤技術であり続けています。その有効性は、BERTスタイルのアーキテクチャで広く普及したセルフアテンションの、シーケンスレベルの並列処理による高品質な双方向文脈化能力に由来します。最近、Aveyは自己回帰的かつアテンション不要な代替手法として導入され、エンコーダー専用への適応が自然に可能であることが示されました。本論文では、Aveyをエンコーダー専用パラダイム向けに再構築し、分離された静的・動的パラメータ化、安定性志向の正規化、ニューラル圧縮といったアーキテクチャ上の複数の革新を提案します。実験結果では、この再構築されたアーキテクチャが広く使用されている4つのTransformerベースのエンコーダーと比較して優位性を示し、標準的なトークン分類および情報検索ベンチマークで一貫して優れた性能を発揮しながら、長文コンテキストへの効率的なスケーリングを実現しています。

English

Compact pretrained bidirectional encoders remain the backbone of industrial NLP under tight compute and memory budgets. Their effectiveness stems from self-attention's ability to deliver high-quality bidirectional contextualization with sequence-level parallelism, as popularized by BERT-style architectures. Recently, Avey was introduced as an autoregressive, attention-free alternative that naturally admits an encoder-only adaptation. In this paper, we reformulate Avey for the encoder-only paradigm and propose several innovations to its architecture, including decoupled static and dynamic parameterizations, stability-oriented normalization, and neural compression. Results show that this reformulated architecture compares favorably to four widely used Transformer-based encoders, consistently outperforming them on standard token-classification and information-retrieval benchmarks while scaling more efficiently to long contexts.

Avey-B

要旨

Support