艾维-B - 论文详情

摘要

紧凑型预训练双向编码器在计算和内存资源受限的工业自然语言处理领域始终是中流砥柱。其有效性源于自注意力机制能够通过序列级并行化实现高质量双向上下文建模，这一特性已被BERT式架构广泛验证。近期提出的Avey模型作为一种无需注意力的自回归替代方案，天然适配仅编码器范式。本文针对仅编码器范式重构Avey模型，并提出包括解耦静态动态参数化、稳定性导向归一化及神经压缩在内的多项架构创新。实验表明，重构后的架构在标准词元分类和信息检索基准测试中持续优于四种广泛使用的基于Transformer的编码器，且在长上下文场景下具有更优的扩展效率。

English

Compact pretrained bidirectional encoders remain the backbone of industrial NLP under tight compute and memory budgets. Their effectiveness stems from self-attention's ability to deliver high-quality bidirectional contextualization with sequence-level parallelism, as popularized by BERT-style architectures. Recently, Avey was introduced as an autoregressive, attention-free alternative that naturally admits an encoder-only adaptation. In this paper, we reformulate Avey for the encoder-only paradigm and propose several innovations to its architecture, including decoupled static and dynamic parameterizations, stability-oriented normalization, and neural compression. Results show that this reformulated architecture compares favorably to four widely used Transformer-based encoders, consistently outperforming them on standard token-classification and information-retrieval benchmarks while scaling more efficiently to long contexts.