ChatPaper.aiChatPaper

Jamba-1.5:规模化的混合Transformer-Mamba模型

Jamba-1.5: Hybrid Transformer-Mamba Models at Scale

August 22, 2024
作者: Jamba Team, Barak Lenz, Alan Arazi, Amir Bergman, Avshalom Manevich, Barak Peleg, Ben Aviram, Chen Almagor, Clara Fridman, Dan Padnos, Daniel Gissin, Daniel Jannai, Dor Muhlgay, Dor Zimberg, Edden M Gerber, Elad Dolev, Eran Krakovsky, Erez Safahi, Erez Schwartz, Gal Cohen, Gal Shachaf, Haim Rozenblum, Hofit Bata, Ido Blass, Inbal Magar, Itay Dalmedigos, Jhonathan Osin, Julie Fadlon, Maria Rozman, Matan Danos, Michael Gokhman, Mor Zusman, Naama Gidron, Nir Ratner, Noam Gat, Noam Rozen, Oded Fried, Ohad Leshno, Omer Antverg, Omri Abend, Opher Lieber, Or Dagan, Orit Cohavi, Raz Alon, Ro'i Belson, Roi Cohen, Rom Gilad, Roman Glozman, Shahar Lev, Shaked Meirom, Tal Delbari, Tal Ness, Tomer Asida, Tom Ben Gal, Tom Braude, Uriya Pumerantz, Yehoshua Cohen, Yonatan Belinkov, Yuval Globerson, Yuval Peleg Levy, Yoav Shoham
cs.AI

摘要

我们提出了Jamba-1.5,这是基于我们的Jamba架构的新型指令调优大型语言模型。Jamba是一种混合Transformer-Mamba专家混合体架构,能够在各种上下文长度下提供高吞吐量和低内存使用,同时保持与Transformer模型相同或更好的质量。我们发布了两种模型规格:Jamba-1.5-Large,具有94B活跃参数,以及Jamba-1.5-Mini,具有12B活跃参数。这两种模型都经过微调,用于各种对话和指令遵循能力,并且具有256K标记的有效上下文长度,是开放权重模型中最大的。为了支持具有成本效益的推理,我们引入了ExpertsInt8,这是一种新颖的量化技术,允许在处理256K标记上下文时将Jamba-1.5-Large适配到一台配备8个80GB GPU的机器上,而不会损失质量。在一系列学术和聊天机器人基准测试中进行评估时,Jamba-1.5模型取得了出色的结果,同时提供了高吞吐量,并在长上下文基准测试中胜过其他开放权重模型。这两种规格的模型权重均在Jamba开放模型许可下公开提供,我们也将ExpertsInt8作为开源发布。
English
We present Jamba-1.5, new instruction-tuned large language models based on our Jamba architecture. Jamba is a hybrid Transformer-Mamba mixture of experts architecture, providing high throughput and low memory usage across context lengths, while retaining the same or better quality as Transformer models. We release two model sizes: Jamba-1.5-Large, with 94B active parameters, and Jamba-1.5-Mini, with 12B active parameters. Both models are fine-tuned for a variety of conversational and instruction-following capabilties, and have an effective context length of 256K tokens, the largest amongst open-weight models. To support cost-effective inference, we introduce ExpertsInt8, a novel quantization technique that allows fitting Jamba-1.5-Large on a machine with 8 80GB GPUs when processing 256K-token contexts without loss of quality. When evaluated on a battery of academic and chatbot benchmarks, Jamba-1.5 models achieve excellent results while providing high throughput and outperforming other open-weight models on long-context benchmarks. The model weights for both sizes are publicly available under the Jamba Open Model License and we release ExpertsInt8 as open source.

Summary

AI-Generated Summary

PDF343November 16, 2024