ChatPaper.aiChatPaper

XGen-7B 技术报告

XGen-7B Technical Report

September 7, 2023
作者: Erik Nijkamp, Tian Xie, Hiroaki Hayashi, Bo Pang, Congying Xia, Chen Xing, Jesse Vig, Semih Yavuz, Philippe Laban, Ben Krause, Senthil Purushwalkam, Tong Niu, Wojciech Kryściński, Lidiya Murakhovs'ka, Prafulla Kumar Choubey, Alex Fabbri, Ye Liu, Rui Meng, Lifu Tu, Meghana Bhat, Chien-Sheng Wu, Silvio Savarese, Yingbo Zhou, Shafiq Joty, Caiming Xiong
cs.AI

摘要

大型语言模型(LLMs)已经在各个领域变得无处不在,改变了我们与信息互动和进行研究的方式。然而,大多数表现优异的LLMs仍然被限制在专有墙壁之后,阻碍了科学进展。另一方面,大多数开源LLMs在支持更长序列长度方面存在局限,而这是许多需要在输入上下文上进行推理的任务的关键要求。为了解决这个问题,我们训练了XGen,一系列拥有70亿参数模型,支持长达8K序列长度,最多达1.5T标记。我们还对XGen模型在公共领域的指导性数据上进行了微调,创建了它们的指导性调整版本(XGen-Inst)。我们开源我们的模型,旨在促进研究进展和商业应用。我们在标准基准测试上的评估显示,与最先进的开源LLMs相比,XGen模型取得了可比或更好的结果。我们针对长序列建模任务的定向评估显示,我们的8K序列模型相对于2K序列开源LLMs具有优势。
English
Large Language Models (LLMs) have become ubiquitous across various domains, transforming the way we interact with information and conduct research. However, most high-performing LLMs remain confined behind proprietary walls, hindering scientific progress. Most open-source LLMs, on the other hand, are limited in their ability to support longer sequence lengths, which is a key requirement for many tasks that require inference over an input context. To address this, we have trained XGen, a series of 7B parameter models on up to 8K sequence length for up to 1.5T tokens. We have also finetuned the XGen models on public-domain instructional data, creating their instruction-tuned counterparts (XGen-Inst). We open-source our models for both research advancements and commercial applications. Our evaluation on standard benchmarks shows that XGen models achieve comparable or better results when compared with state-of-the-art open-source LLMs. Our targeted evaluation on long sequence modeling tasks shows the benefits of our 8K-sequence models over 2K-sequence open-source LLMs.

Summary

AI-Generated Summary

PDF80December 15, 2024