ChatPaper.aiChatPaper

ChatGLM:從 GLM-130B 到 GLM-4 的一系列大型語言模型工具

ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

June 18, 2024
作者: Team GLM, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, Jing Zhang, Juanzi Li, Lei Zhao, Lindong Wu, Lucen Zhong, Mingdao Liu, Minlie Huang, Peng Zhang, Qinkai Zheng, Rui Lu, Shuaiqi Duan, Shudan Zhang, Shulin Cao, Shuxun Yang, Weng Lam Tam, Wenyi Zhao, Xiao Liu, Xiao Xia, Xiaohan Zhang, Xiaotao Gu, Xin Lv, Xinghan Liu, Xinyi Liu, Xinyue Yang, Xixuan Song, Xunkai Zhang, Yifan An, Yifan Xu, Yilin Niu, Yuantao Yang, Yueyan Li, Yushi Bai, Yuxiao Dong, Zehan Qi, Zhaoyu Wang, Zhen Yang, Zhengxiao Du, Zhenyu Hou, Zihan Wang
cs.AI

摘要

我們介紹了ChatGLM,這是我們多年來不斷發展的一系列大型語言模型。本報告主要聚焦於GLM-4語言系列,包括GLM-4、GLM-4-Air和GLM-4-9B。它們代表了我們最具能力的模型,這些模型是通過從前三代ChatGLM中獲得的所有見解和教訓進行訓練的。迄今為止,GLM-4模型主要在中文和英文中預訓練了數萬億個標記,還包括來自24種語言的一小部分語料庫,主要針對中文和英文使用進行了對齊。高質量的對齊是通過多階段的後訓練過程實現的,其中包括監督微調和從人類反饋中學習。評估結果顯示,GLM-4在多個一般指標(如MMLU、GSM8K、MATH、BBH、GPQA和HumanEval)方面與GPT-4不相上下甚至表現更優,在指令跟隨方面接近於GPT-4-Turbo(通過IFEval測量),在長文本任務上與GPT-4 Turbo(128K)和Claude 3相匹配,並在中文對齊方面(通過AlignBench測量)優於GPT-4。GLM-4 All Tools模型進一步對齊以理解用戶意圖,並自主決定何時以及使用哪些工具(包括網頁瀏覽器、Python解釋器、文本到圖像模型和用戶定義函數)來有效完成複雜任務。在實際應用中,它在訪問線上信息和使用Python解釋器解決數學問題等任務上與甚至超越了GPT-4 All Tools。我們已經開源了一系列模型,包括ChatGLM-6B(三代)、GLM-4-9B(128K、1M)、GLM-4V-9B、WebGLM和CodeGeeX,在2023年僅一年就在Hugging Face上吸引了超過1000萬次下載。這些開源模型可以通過https://github.com/THUDM和https://huggingface.co/THUDM訪問。
English
We introduce ChatGLM, an evolving family of large language models that we have been developing over time. This report primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most capable models that are trained with all the insights and lessons gained from the preceding three generations of ChatGLM. To date, the GLM-4 models are pre-trained on ten trillions of tokens mostly in Chinese and English, along with a small set of corpus from 24 languages, and aligned primarily for Chinese and English usage. The high-quality alignment is achieved via a multi-stage post-training process, which involves supervised fine-tuning and learning from human feedback. Evaluations show that GLM-4 1) closely rivals or outperforms GPT-4 in terms of general metrics such as MMLU, GSM8K, MATH, BBH, GPQA, and HumanEval, 2) gets close to GPT-4-Turbo in instruction following as measured by IFEval, 3) matches GPT-4 Turbo (128K) and Claude 3 for long context tasks, and 4) outperforms GPT-4 in Chinese alignments as measured by AlignBench. The GLM-4 All Tools model is further aligned to understand user intent and autonomously decide when and which tool(s) touse -- including web browser, Python interpreter, text-to-image model, and user-defined functions -- to effectively complete complex tasks. In practical applications, it matches and even surpasses GPT-4 All Tools in tasks like accessing online information via web browsing and solving math problems using Python interpreter. Over the course, we have open-sourced a series of models, including ChatGLM-6B (three generations), GLM-4-9B (128K, 1M), GLM-4V-9B, WebGLM, and CodeGeeX, attracting over 10 million downloads on Hugging face in the year 2023 alone. The open models can be accessed through https://github.com/THUDM and https://huggingface.co/THUDM.

Summary

AI-Generated Summary

PDF332December 4, 2024