ChatPaper.aiChatPaper

ChatGLM:从GLM-130B到GLM-4的一系列大型语言模型家族。

ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

June 18, 2024
作者: Team GLM, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, Jing Zhang, Juanzi Li, Lei Zhao, Lindong Wu, Lucen Zhong, Mingdao Liu, Minlie Huang, Peng Zhang, Qinkai Zheng, Rui Lu, Shuaiqi Duan, Shudan Zhang, Shulin Cao, Shuxun Yang, Weng Lam Tam, Wenyi Zhao, Xiao Liu, Xiao Xia, Xiaohan Zhang, Xiaotao Gu, Xin Lv, Xinghan Liu, Xinyi Liu, Xinyue Yang, Xixuan Song, Xunkai Zhang, Yifan An, Yifan Xu, Yilin Niu, Yuantao Yang, Yueyan Li, Yushi Bai, Yuxiao Dong, Zehan Qi, Zhaoyu Wang, Zhen Yang, Zhengxiao Du, Zhenyu Hou, Zihan Wang
cs.AI

摘要

我们介绍了ChatGLM,这是我们多年来开发的一系列不断发展的大型语言模型。本报告主要关注GLM-4语言系列,包括GLM-4、GLM-4-Air和GLM-4-9B。它们代表了我们最具能力的模型,这些模型是通过从前三代ChatGLM中获得的所有见解和经验进行训练的。迄今为止,GLM-4模型主要在中文和英文中预训练了一万亿个标记,同时还包括来自24种语言的一小部分语料库,并主要针对中文和英文使用进行了对齐。通过多阶段的后训练过程,包括监督微调和从人类反馈中学习,实现了高质量的对齐。评估表明,GLM-4在各种常规指标如MMLU、GSM8K、MATH、BBH、GPQA和HumanEval方面与GPT-4不相上下甚至表现更好,接近了GPT-4-Turbo在指令遵循方面的表现,与GPT-4 Turbo(128K)和Claude 3在长文本任务上相匹敌,以AlignBench测量的中文对齐方面胜过了GPT-4。GLM-4 All Tools模型进一步对齐,以理解用户意图,并自主决定何时以及使用哪些工具(包括网络浏览器、Python解释器、文本到图像模型和用户定义函数),以有效完成复杂任务。在实际应用中,它在访问在线信息和使用Python解释器解决数学问题等任务方面与甚至超过了GPT-4 All Tools。在过程中,我们开源了一系列模型,包括ChatGLM-6B(三代)、GLM-4-9B(128K、1M)、GLM-4V-9B、WebGLM和CodeGeeX,在2023年仅一年内在Hugging Face上吸引了超过1000万次下载。这些开源模型可以通过https://github.com/THUDM和https://huggingface.co/THUDM进行访问。
English
We introduce ChatGLM, an evolving family of large language models that we have been developing over time. This report primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most capable models that are trained with all the insights and lessons gained from the preceding three generations of ChatGLM. To date, the GLM-4 models are pre-trained on ten trillions of tokens mostly in Chinese and English, along with a small set of corpus from 24 languages, and aligned primarily for Chinese and English usage. The high-quality alignment is achieved via a multi-stage post-training process, which involves supervised fine-tuning and learning from human feedback. Evaluations show that GLM-4 1) closely rivals or outperforms GPT-4 in terms of general metrics such as MMLU, GSM8K, MATH, BBH, GPQA, and HumanEval, 2) gets close to GPT-4-Turbo in instruction following as measured by IFEval, 3) matches GPT-4 Turbo (128K) and Claude 3 for long context tasks, and 4) outperforms GPT-4 in Chinese alignments as measured by AlignBench. The GLM-4 All Tools model is further aligned to understand user intent and autonomously decide when and which tool(s) touse -- including web browser, Python interpreter, text-to-image model, and user-defined functions -- to effectively complete complex tasks. In practical applications, it matches and even surpasses GPT-4 All Tools in tasks like accessing online information via web browsing and solving math problems using Python interpreter. Over the course, we have open-sourced a series of models, including ChatGLM-6B (three generations), GLM-4-9B (128K, 1M), GLM-4V-9B, WebGLM, and CodeGeeX, attracting over 10 million downloads on Hugging face in the year 2023 alone. The open models can be accessed through https://github.com/THUDM and https://huggingface.co/THUDM.

Summary

AI-Generated Summary

PDF332December 4, 2024