Bielik 11B v2 技术报告
Bielik 11B v2 Technical Report
May 5, 2025
作者: Krzysztof Ociepa, Łukasz Flis, Krzysztof Wróbel, Adrian Gwoździej, Remigiusz Kinas
cs.AI
摘要
我们推出Bielik 11B v2,这是一款专为波兰语文本处理优化的尖端语言模型。该模型基于Mistral 7B v0.2架构,通过深度扩展技术扩展至110亿参数,在波兰语基准测试中展现出卓越性能,同时保持了强大的跨语言能力。我们引入了两项关键技术创新:加权指令交叉熵损失,通过为训练样本分配基于质量的权重来优化跨多种指令类型的学习;以及自适应学习率,根据上下文长度动态调整。在多个基准测试中的全面评估表明,Bielik 11B v2超越了众多参数规模是其2至6倍的更大模型,并在从语言理解到复杂推理的各项任务上显著优于其他专门针对波兰语的模型。该模型的参数效率及广泛的量化选项使其能够部署于多种硬件配置,推动了波兰语AI能力的发展,并为资源受限语言中的高效语言建模设立了新标杆。
English
We present Bielik 11B v2, a state-of-the-art language model optimized for
Polish text processing. Built on the Mistral 7B v0.2 architecture and scaled to
11B parameters using depth up-scaling, this model demonstrates exceptional
performance across Polish language benchmarks while maintaining strong
cross-lingual capabilities. We introduce two key technical innovations:
Weighted Instruction Cross-Entropy Loss, which optimizes learning across
diverse instruction types by assigning quality-based weights to training
examples, and Adaptive Learning Rate, which dynamically adjusts based on
context length. Comprehensive evaluation across multiple benchmarks
demonstrates that Bielik 11B v2 outperforms many larger models, including those
with 2-6 times more parameters, and significantly surpasses other specialized
Polish language models on tasks ranging from linguistic understanding to
complex reasoning. The model's parameter efficiency and extensive quantization
options enable deployment across various hardware configurations, advancing
Polish language AI capabilities and establishing new benchmarks for
resource-efficient language modeling in less-represented languages.Summary
AI-Generated Summary