ChatPaper.aiChatPaper

ModernGBERT:從零開始訓練的德語專用10億參數編碼器模型

ModernGBERT: German-only 1B Encoder Model Trained from Scratch

May 19, 2025
作者: Anton Ehrmanntraut, Julia Wunderle, Jan Pfister, Fotis Jannidis, Andreas Hotho
cs.AI

摘要

儘管僅解碼器語言模型備受矚目,編碼器在資源受限的應用中仍扮演著關鍵角色。我們推出了ModernGBERT(134M、1B),這是一個完全透明的德語編碼器模型家族,從頭開始訓練,並融入了ModernBERT的架構創新。為了評估從頭訓練編碼器的實際權衡,我們還介紹了LL\"aMmlein2Vec(120M、1B、7B),這是一個通過LLM2Vec從德語僅解碼器模型衍生而來的編碼器家族。我們在自然語言理解、文本嵌入和長上下文推理任務上對所有模型進行了基準測試,從而實現了專用編碼器與轉換解碼器之間的對比。我們的結果表明,ModernGBERT 1B在性能和參數效率方面均優於先前的德語編碼器以及通過LLM2Vec適應的編碼器。所有模型、訓練數據、檢查點和代碼均已公開,以透明、高性能的編碼器模型推動德語NLP生態系統的發展。
English
Despite the prominence of decoder-only language models, encoders remain crucial for resource-constrained applications. We introduce ModernGBERT (134M, 1B), a fully transparent family of German encoder models trained from scratch, incorporating architectural innovations from ModernBERT. To evaluate the practical trade-offs of training encoders from scratch, we also present LL\"aMmlein2Vec (120M, 1B, 7B), a family of encoders derived from German decoder-only models via LLM2Vec. We benchmark all models on natural language understanding, text embedding, and long-context reasoning tasks, enabling a controlled comparison between dedicated encoders and converted decoders. Our results show that ModernGBERT 1B outperforms prior state-of-the-art German encoders as well as encoders adapted via LLM2Vec, with regard to performance and parameter-efficiency. All models, training data, checkpoints and code are publicly available, advancing the German NLP ecosystem with transparent, high-performance encoder models.

Summary

AI-Generated Summary

PDF192May 27, 2025