Bielik 11B v2 技術報告
Bielik 11B v2 Technical Report
May 5, 2025
作者: Krzysztof Ociepa, Łukasz Flis, Krzysztof Wróbel, Adrian Gwoździej, Remigiusz Kinas
cs.AI
摘要
我們推出Bielik 11B v2,這是一款專為波蘭語文本處理優化的尖端語言模型。該模型基於Mistral 7B v0.2架構,並通過深度擴展技術擴展至11B參數,在波蘭語基準測試中展現出卓越性能,同時保持強大的跨語言能力。我們引入了兩項關鍵技術創新:加權指令交叉熵損失,通過為訓練樣本分配基於質量的權重來優化多樣化指令類型的學習;以及自適應學習率,根據上下文長度動態調整。在多重基準測試中的全面評估表明,Bielik 11B v2超越了許多參數量多達2-6倍的更大模型,並在從語言理解到複雜推理的任務上顯著優於其他專門的波蘭語模型。該模型的參數效率及廣泛的量化選項使其能夠部署於多種硬件配置,推動了波蘭語AI能力的發展,並為資源受限語言中的高效語言建模設立了新基準。
English
We present Bielik 11B v2, a state-of-the-art language model optimized for
Polish text processing. Built on the Mistral 7B v0.2 architecture and scaled to
11B parameters using depth up-scaling, this model demonstrates exceptional
performance across Polish language benchmarks while maintaining strong
cross-lingual capabilities. We introduce two key technical innovations:
Weighted Instruction Cross-Entropy Loss, which optimizes learning across
diverse instruction types by assigning quality-based weights to training
examples, and Adaptive Learning Rate, which dynamically adjusts based on
context length. Comprehensive evaluation across multiple benchmarks
demonstrates that Bielik 11B v2 outperforms many larger models, including those
with 2-6 times more parameters, and significantly surpasses other specialized
Polish language models on tasks ranging from linguistic understanding to
complex reasoning. The model's parameter efficiency and extensive quantization
options enable deployment across various hardware configurations, advancing
Polish language AI capabilities and establishing new benchmarks for
resource-efficient language modeling in less-represented languages.Summary
AI-Generated Summary