阿維-B

摘要

在計算與記憶體預算受限的工業級自然語言處理領域，緊湊型預訓練雙向編碼器始終是核心支柱。其效能優勢源於自注意力機制能透過序列級並行化實現高品質的雙向上下文建模，這一特性已由BERT類架構廣泛驗證。近期提出的Avey作為一種無注意力機制的自回歸替代方案，天然支援純編碼器改編。本文將Avey重構為純編碼器範式，並提出多項架構創新：包括解耦的靜態與動態參數化、面向穩定性的歸一化技術及神經壓縮機制。實驗結果表明，該重構架構在四種廣泛使用的基於Transformer的編碼器中表現優異，不僅在標準詞元分類與資訊檢索基準上持續超越對比模型，更能高效擴展至長文本上下文場景。

English

Compact pretrained bidirectional encoders remain the backbone of industrial NLP under tight compute and memory budgets. Their effectiveness stems from self-attention's ability to deliver high-quality bidirectional contextualization with sequence-level parallelism, as popularized by BERT-style architectures. Recently, Avey was introduced as an autoregressive, attention-free alternative that naturally admits an encoder-only adaptation. In this paper, we reformulate Avey for the encoder-only paradigm and propose several innovations to its architecture, including decoupled static and dynamic parameterizations, stability-oriented normalization, and neural compression. Results show that this reformulated architecture compares favorably to four widely used Transformer-based encoders, consistently outperforming them on standard token-classification and information-retrieval benchmarks while scaling more efficiently to long contexts.

Avey-B

摘要

Support