XGen-7B 技術報告

摘要

大型語言模型（LLMs）已經在各個領域變得無所不在，改變了我們與信息互動和進行研究的方式。然而，大多數表現優異的LLMs仍然被限制在專有領域之內，阻礙了科學進步。另一方面，大多數開源的LLMs在支持較長序列長度方面存在限制，這是許多需要對輸入上下文進行推斷的任務的關鍵要求。為了應對這一問題，我們訓練了XGen，一系列擁有70億參數模型，最長可達8K序列長度，總計可達1.5T標記。我們還對XGen模型在公共領域的指導性數據上進行了微調，創建了它們的指導性調整版本（XGen-Inst）。我們將我們的模型開源，旨在促進研究進展和商業應用。我們在標準基準測試上的評估顯示，與最先進的開源LLMs相比，XGen模型取得了可比或更好的結果。我們針對長序列建模任務的有針對性評估顯示，我們的8K序列模型相較於2K序列的開源LLMs具有更多優勢。

English

Large Language Models (LLMs) have become ubiquitous across various domains, transforming the way we interact with information and conduct research. However, most high-performing LLMs remain confined behind proprietary walls, hindering scientific progress. Most open-source LLMs, on the other hand, are limited in their ability to support longer sequence lengths, which is a key requirement for many tasks that require inference over an input context. To address this, we have trained XGen, a series of 7B parameter models on up to 8K sequence length for up to 1.5T tokens. We have also finetuned the XGen models on public-domain instructional data, creating their instruction-tuned counterparts (XGen-Inst). We open-source our models for both research advancements and commercial applications. Our evaluation on standard benchmarks shows that XGen models achieve comparable or better results when compared with state-of-the-art open-source LLMs. Our targeted evaluation on long sequence modeling tasks shows the benefits of our 8K-sequence models over 2K-sequence open-source LLMs.

XGen-7B 技術報告

XGen-7B Technical Report

摘要

Summary

Support

Support