ChatPaper.aiChatPaper

NeoBERT:新一代BERT模型

NeoBERT: A Next-Generation BERT

February 26, 2025
作者: Lola Le Breton, Quentin Fournier, Mariam El Mezouar, Sarath Chandar
cs.AI

摘要

近期在架構、預訓練和微調方面的創新,使得如LLaMA和DeepSeek等大型自回歸語言模型展現出卓越的上下文學習與推理能力。相比之下,儘管BERT和RoBERTa等編碼器為眾多下游自然語言處理應用奠定了基礎,卻未見同等程度的進步。為彌合這一差距,我們推出了NeoBERT,這是一款新一代的編碼器,它通過整合架構、現代數據及優化預訓練方法中的尖端技術,重新定義了雙向模型的能力。NeoBERT設計便於無縫採用:它作為現有基礎模型的即插即用替代品,依賴於最佳的深度與寬度比例,並利用長達4,096個標記的擴展上下文長度。儘管其參數規模僅為2.5億,卻在龐大的MTEB基準測試中取得了領先成果,在相同微調條件下,超越了BERT Large、RoBERTa Large、NomicBERT及ModernBERT。此外,我們嚴格評估了每項改進對GLUE的影響,並為MTEB設計了一套統一的微調與評估框架。我們公開了所有代碼、數據、檢查點及訓練腳本,以加速研究與實際應用。
English
Recent innovations in architecture, pre-training, and fine-tuning have led to the remarkable in-context learning and reasoning abilities of large auto-regressive language models such as LLaMA and DeepSeek. In contrast, encoders like BERT and RoBERTa have not seen the same level of progress despite being foundational for many downstream NLP applications. To bridge this gap, we introduce NeoBERT, a next-generation encoder that redefines the capabilities of bidirectional models by integrating state-of-the-art advancements in architecture, modern data, and optimized pre-training methodologies. NeoBERT is designed for seamless adoption: it serves as a plug-and-play replacement for existing base models, relies on an optimal depth-to-width ratio, and leverages an extended context length of 4,096 tokens. Despite its compact 250M parameter footprint, it achieves state-of-the-art results on the massive MTEB benchmark, outperforming BERT large, RoBERTa large, NomicBERT, and ModernBERT under identical fine-tuning conditions. In addition, we rigorously evaluate the impact of each modification on GLUE and design a uniform fine-tuning and evaluation framework for MTEB. We release all code, data, checkpoints, and training scripts to accelerate research and real-world adoption.

Summary

AI-Generated Summary

PDF396February 28, 2025