ChatPaper.aiChatPaper

Qwen技术报告

Qwen Technical Report

September 28, 2023
作者: Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng Xu, Jin Xu, An Yang, Hao Yang, Jian Yang, Shusheng Yang, Yang Yao, Bowen Yu, Hongyi Yuan, Zheng Yuan, Jianwei Zhang, Xingxuan Zhang, Yichang Zhang, Zhenru Zhang, Chang Zhou, Jingren Zhou, Xiaohuan Zhou, Tianhang Zhu
cs.AI

摘要

大型语言模型(LLMs)已经彻底改变了人工智能领域,使得自然语言处理任务不再仅限于人类。在这项工作中,我们介绍了Qwen,我们大型语言模型系列的第一部分。Qwen是一个全面的语言模型系列,包括具有不同参数数量的独特模型。它包括Qwen,基础预训练语言模型,以及Qwen-Chat,使用人类对齐技术微调的聊天模型。基础语言模型在众多下游任务中始终展现出优越性能,而聊天模型,特别是那些使用人类反馈强化学习(RLHF)训练的模型,具有很高的竞争力。聊天模型具有先进的工具使用和规划能力,用于创建代理应用,在处理复杂任务时表现出色,甚至与更大的模型相比,如使用代码解释器。此外,我们开发了专门针对编码的模型,Code-Qwen和Code-Qwen-Chat,以及专注于数学的模型,Math-Qwen-Chat,这些模型是基于基础语言模型构建的。这些模型在性能上明显优于开源模型,并略逊于专有模型。
English
Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans. In this work, we introduce Qwen, the first installment of our large language model series. Qwen is a comprehensive language model series that encompasses distinct models with varying parameter counts. It includes Qwen, the base pretrained language models, and Qwen-Chat, the chat models finetuned with human alignment techniques. The base language models consistently demonstrate superior performance across a multitude of downstream tasks, and the chat models, particularly those trained using Reinforcement Learning from Human Feedback (RLHF), are highly competitive. The chat models possess advanced tool-use and planning capabilities for creating agent applications, showcasing impressive performance even when compared to bigger models on complex tasks like utilizing a code interpreter. Furthermore, we have developed coding-specialized models, Code-Qwen and Code-Qwen-Chat, as well as mathematics-focused models, Math-Qwen-Chat, which are built upon base language models. These models demonstrate significantly improved performance in comparison with open-source models, and slightly fall behind the proprietary models.
PDF362December 15, 2024