XGen-7B 技術報告
XGen-7B Technical Report
September 7, 2023
作者: Erik Nijkamp, Tian Xie, Hiroaki Hayashi, Bo Pang, Congying Xia, Chen Xing, Jesse Vig, Semih Yavuz, Philippe Laban, Ben Krause, Senthil Purushwalkam, Tong Niu, Wojciech Kryściński, Lidiya Murakhovs'ka, Prafulla Kumar Choubey, Alex Fabbri, Ye Liu, Rui Meng, Lifu Tu, Meghana Bhat, Chien-Sheng Wu, Silvio Savarese, Yingbo Zhou, Shafiq Joty, Caiming Xiong
cs.AI
摘要
大型語言模型(LLMs)已經在各個領域變得無所不在,改變了我們與信息互動和進行研究的方式。然而,大多數表現優異的LLMs仍然被限制在專有領域之內,阻礙了科學進步。另一方面,大多數開源的LLMs在支持較長序列長度方面存在限制,這是許多需要對輸入上下文進行推斷的任務的關鍵要求。為了應對這一問題,我們訓練了XGen,一系列擁有70億參數模型,最長可達8K序列長度,總計可達1.5T標記。我們還對XGen模型在公共領域的指導性數據上進行了微調,創建了它們的指導性調整版本(XGen-Inst)。我們將我們的模型開源,旨在促進研究進展和商業應用。我們在標準基準測試上的評估顯示,與最先進的開源LLMs相比,XGen模型取得了可比或更好的結果。我們針對長序列建模任務的有針對性評估顯示,我們的8K序列模型相較於2K序列的開源LLMs具有更多優勢。
English
Large Language Models (LLMs) have become ubiquitous across various domains,
transforming the way we interact with information and conduct research.
However, most high-performing LLMs remain confined behind proprietary walls,
hindering scientific progress. Most open-source LLMs, on the other hand, are
limited in their ability to support longer sequence lengths, which is a key
requirement for many tasks that require inference over an input context. To
address this, we have trained XGen, a series of 7B parameter models on up to 8K
sequence length for up to 1.5T tokens. We have also finetuned the XGen models
on public-domain instructional data, creating their instruction-tuned
counterparts (XGen-Inst). We open-source our models for both research
advancements and commercial applications. Our evaluation on standard benchmarks
shows that XGen models achieve comparable or better results when compared with
state-of-the-art open-source LLMs. Our targeted evaluation on long sequence
modeling tasks shows the benefits of our 8K-sequence models over 2K-sequence
open-source LLMs.Summary
AI-Generated Summary