Qwen2 技术报告

摘要

本报告介绍了Qwen2系列，这是我们大型语言模型和大型多模态模型的最新成员。我们发布了一套全面的基础和指导调整的语言模型，涵盖参数范围从0.5到720亿，包括密集模型和专家混合模型。Qwen2超越了大多数先前的开放权重模型，包括其前身Qwen1.5，并在语言理解、生成、多语言能力、编码、数学和推理等各种基准测试中表现出竞争力。旗舰模型Qwen2-72B展示了卓越的性能：在MMLU上为84.2，在GPQA上为37.9，在HumanEval上为64.6，在GSM8K上为89.5，在BBH上为82.4，作为基础语言模型。指导调整的变体Qwen2-72B-Instruct，在MT-Bench上达到9.1，在Arena-Hard上为48.1，在LiveCodeBench上为35.7。此外，Qwen2展示了强大的多语言能力，在大约30种语言中表现出色，涵盖英语、中文、西班牙语、法语、德语、阿拉伯语、俄语、韩语、日语、泰语、越南语等，突显了其多功能性和全球影响力。为促进社区创新和可访问性，我们已经在Hugging Face和ModelScope上公开提供了Qwen2模型权重，以及在GitHub上包括示例代码在内的补充材料。这些平台还包括量化、微调和部署资源，促进了广泛的应用和研究工作。

English

This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, and exhibits competitive performance relative to proprietary models across diverse benchmarks on language understanding, generation, multilingual proficiency, coding, mathematics, and reasoning. The flagship model, Qwen2-72B, showcases remarkable performance: 84.2 on MMLU, 37.9 on GPQA, 64.6 on HumanEval, 89.5 on GSM8K, and 82.4 on BBH as a base language model. The instruction-tuned variant, Qwen2-72B-Instruct, attains 9.1 on MT-Bench, 48.1 on Arena-Hard, and 35.7 on LiveCodeBench. Moreover, Qwen2 demonstrates robust multilingual capabilities, proficient in approximately 30 languages, spanning English, Chinese, Spanish, French, German, Arabic, Russian, Korean, Japanese, Thai, Vietnamese, and more, underscoring its versatility and global reach. To foster community innovation and accessibility, we have made the Qwen2 model weights openly available on Hugging Face1 and ModelScope2, and the supplementary materials including example code on GitHub3. These platforms also include resources for quantization, fine-tuning, and deployment, facilitating a wide range of applications and research endeavors.

Qwen2 技术报告

Qwen2 Technical Report

摘要

Support