ModernGBERT: 처음부터 학습된 독일어 전용 10억 파라미터 인코더 모델

초록

디코더 전용 언어 모델의 두각에도 불구하고, 인코더는 여전히 자원이 제한된 애플리케이션에서 중요한 역할을 합니다. 우리는 ModernBERT의 아키텍처 혁신을 통합하여 처음부터 학습된 완전히 투명한 독일어 인코더 모델군인 ModernGBERT(134M, 1B)를 소개합니다. 또한 처음부터 인코더를 학습하는 실질적인 장단점을 평가하기 위해, LLM2Vec을 통해 독일어 디코더 전용 모델에서 파생된 인코더 모델군인 LL\"aMmlein2Vec(120M, 1B, 7B)을 제시합니다. 우리는 모든 모델을 자연어 이해, 텍스트 임베딩, 장문 맥락 추론 과제에서 벤치마킹하여 전용 인코더와 변환된 디코더 간의 통제된 비교를 가능하게 합니다. 우리의 결과는 ModernGBERT 1B가 성능과 파라미터 효율성 측면에서 이전의 최첨단 독일어 인코더와 LLM2Vec을 통해 적응된 인코더를 능가함을 보여줍니다. 모든 모델, 학습 데이터, 체크포인트 및 코드는 공개되어 투명하고 고성능의 인코더 모델로 독일어 NLP 생태계를 발전시킵니다.

English

Despite the prominence of decoder-only language models, encoders remain crucial for resource-constrained applications. We introduce ModernGBERT (134M, 1B), a fully transparent family of German encoder models trained from scratch, incorporating architectural innovations from ModernBERT. To evaluate the practical trade-offs of training encoders from scratch, we also present LL\"aMmlein2Vec (120M, 1B, 7B), a family of encoders derived from German decoder-only models via LLM2Vec. We benchmark all models on natural language understanding, text embedding, and long-context reasoning tasks, enabling a controlled comparison between dedicated encoders and converted decoders. Our results show that ModernGBERT 1B outperforms prior state-of-the-art German encoders as well as encoders adapted via LLM2Vec, with regard to performance and parameter-efficiency. All models, training data, checkpoints and code are publicly available, advancing the German NLP ecosystem with transparent, high-performance encoder models.

ModernGBERT: 처음부터 학습된 독일어 전용 10억 파라미터 인코더 모델

ModernGBERT: German-only 1B Encoder Model Trained from Scratch

초록

Support