ChatPaper.aiChatPaper

Qwen 技術報告

Qwen Technical Report

September 28, 2023
作者: Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng Xu, Jin Xu, An Yang, Hao Yang, Jian Yang, Shusheng Yang, Yang Yao, Bowen Yu, Hongyi Yuan, Zheng Yuan, Jianwei Zhang, Xingxuan Zhang, Yichang Zhang, Zhenru Zhang, Chang Zhou, Jingren Zhou, Xiaohuan Zhou, Tianhang Zhu
cs.AI

摘要

大型語言模型(LLMs)已經徹底改變了人工智慧領域,使得先前被認為是人類專屬的自然語言處理任務成為可能。在這項工作中,我們介紹了我們大型語言模型系列的第一個版本 - Qwen。Qwen是一個包含不同參數數量的多個模型的全面語言模型系列。它包括Qwen,基礎預訓練語言模型,以及Qwen-Chat,使用人類對齊技術微調的聊天模型。基礎語言模型在眾多下游任務中始終展現出優越性能,而聊天模型,特別是使用來自人類反饋的強化學習(RLHF)訓練的模型,具有很高的競爭力。這些聊天模型具有先進的工具使用和規劃能力,可用於創建代理應用程序,在處理複雜任務時展現出色的性能,甚至與更大的模型相比,如使用代碼解釋器。此外,我們還開發了專門用於編碼的模型,Code-Qwen和Code-Qwen-Chat,以及專注於數學的模型,Math-Qwen-Chat,這些模型是基於基礎語言模型構建的。這些模型在性能上顯著優於開源模型,略遜於專有模型。
English
Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans. In this work, we introduce Qwen, the first installment of our large language model series. Qwen is a comprehensive language model series that encompasses distinct models with varying parameter counts. It includes Qwen, the base pretrained language models, and Qwen-Chat, the chat models finetuned with human alignment techniques. The base language models consistently demonstrate superior performance across a multitude of downstream tasks, and the chat models, particularly those trained using Reinforcement Learning from Human Feedback (RLHF), are highly competitive. The chat models possess advanced tool-use and planning capabilities for creating agent applications, showcasing impressive performance even when compared to bigger models on complex tasks like utilizing a code interpreter. Furthermore, we have developed coding-specialized models, Code-Qwen and Code-Qwen-Chat, as well as mathematics-focused models, Math-Qwen-Chat, which are built upon base language models. These models demonstrate significantly improved performance in comparison with open-source models, and slightly fall behind the proprietary models.
PDF362December 15, 2024