ChatPaper.aiChatPaper

Qwen2 技術報告

Qwen2 Technical Report

July 15, 2024
作者: An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin, Kai Dang, Keming Lu, Keqin Chen, Kexin Yang, Mei Li, Mingfeng Xue, Na Ni, Pei Zhang, Peng Wang, Ru Peng, Rui Men, Ruize Gao, Runji Lin, Shijie Wang, Shuai Bai, Sinan Tan, Tianhang Zhu, Tianhao Li, Tianyu Liu, Wenbin Ge, Xiaodong Deng, Xiaohuan Zhou, Xingzhang Ren, Xinyu Zhang, Xipin Wei, Xuancheng Ren, Yang Fan, Yang Yao, Yichang Zhang, Yu Wan, Yunfei Chu, Zeyu Cui, Zhenru Zhang, Zhihao Fan
cs.AI

摘要

本報告介紹了 Qwen2 系列,這是我們大型語言模型和大型多模態模型的最新成員。我們釋出了一套全面的基礎和指導調整的語言模型,涵蓋了從 0.5 到 720 億的參數範圍,包括密集模型和專家混合模型。Qwen2 超越了大多數先前的開放式權重模型,包括其前身 Qwen1.5,並在語言理解、生成、多語能力、編碼、數學和推理等各種基準測試中展現出與專有模型具有競爭力的表現。 旗艦模型 Qwen2-72B 展示了卓越的性能:在 MMLU 上為 84.2,在 GPQA 上為 37.9,在 HumanEval 上為 64.6,在 GSM8K 上為 89.5,在 BBH 上為 82.4,作為基本語言模型。指導調整變體 Qwen2-72B-Instruct 在 MT-Bench 上達到 9.1,在 Arena-Hard 上為 48.1,在 LiveCodeBench 上為 35.7。此外,Qwen2 展示了強大的多語能力,在約 30 種語言中表現優秀,包括英語、中文、西班牙語、法語、德語、阿拉伯語、俄語、韓語、日語、泰語、越南語等,突顯了其多功能性和全球覆蓋範圍。 為促進社區創新和可訪問性,我們已在 Hugging Face 和 ModelScope 上公開提供了 Qwen2 模型權重,並在 GitHub 上提供了包括示例代碼在內的補充材料。這些平台還包括量化、微調和部署資源,有助於廣泛應用和研究工作。
English
This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, and exhibits competitive performance relative to proprietary models across diverse benchmarks on language understanding, generation, multilingual proficiency, coding, mathematics, and reasoning. The flagship model, Qwen2-72B, showcases remarkable performance: 84.2 on MMLU, 37.9 on GPQA, 64.6 on HumanEval, 89.5 on GSM8K, and 82.4 on BBH as a base language model. The instruction-tuned variant, Qwen2-72B-Instruct, attains 9.1 on MT-Bench, 48.1 on Arena-Hard, and 35.7 on LiveCodeBench. Moreover, Qwen2 demonstrates robust multilingual capabilities, proficient in approximately 30 languages, spanning English, Chinese, Spanish, French, German, Arabic, Russian, Korean, Japanese, Thai, Vietnamese, and more, underscoring its versatility and global reach. To foster community innovation and accessibility, we have made the Qwen2 model weights openly available on Hugging Face1 and ModelScope2, and the supplementary materials including example code on GitHub3. These platforms also include resources for quantization, fine-tuning, and deployment, facilitating a wide range of applications and research endeavors.

Summary

AI-Generated Summary

PDF1633November 28, 2024