ChatPaper.aiChatPaper

xGen-MM(BLIP-3):一個開放的大型多模型模型家族

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

August 16, 2024
作者: Le Xue, Manli Shu, Anas Awadalla, Jun Wang, An Yan, Senthil Purushwalkam, Honglu Zhou, Viraj Prabhu, Yutong Dai, Michael S Ryoo, Shrikant Kendre, Jieyu Zhang, Can Qin, Shu Zhang, Chia-Chih Chen, Ning Yu, Juntao Tan, Tulika Manoj Awalgaonkar, Shelby Heinecke, Huan Wang, Yejin Choi, Ludwig Schmidt, Zeyuan Chen, Silvio Savarese, Juan Carlos Niebles, Caiming Xiong, Ran Xu
cs.AI

摘要

本報告介紹了 xGen-MM(又稱為 BLIP-3),這是一個用於開發大型多模型模型(LMMs)的框架。該框架包括精心策劃的數據集、訓練配方、模型架構以及一系列的LMMs。xGen-MM,即xGen-MultiModal,擴展了Salesforce xGen在基礎AI模型上的倡議。我們的模型經過嚴格評估,涵蓋各種任務,包括單圖和多圖基準測試。我們的預訓練基本模型展現出強大的上下文學習能力,並且調整指令的模型在與類似模型大小的開源LMMs中展現出競爭力。此外,我們引入了一個帶有DPO的安全調整模型,旨在減輕如幻覺等有害行為並提高安全性。我們將我們的模型、策劃的大規模數據集以及微調代碼庫開源,以促進LMM研究的進一步發展。相關資源將在我們的專案頁面上提供。
English
This report introduces xGen-MM (also known as BLIP-3), a framework for developing Large Multimodal Models (LMMs). The framework comprises meticulously curated datasets, a training recipe, model architectures, and a resulting suite of LMMs. xGen-MM, short for xGen-MultiModal, expands the Salesforce xGen initiative on foundation AI models. Our models undergo rigorous evaluation across a range of tasks, including both single and multi-image benchmarks. Our pre-trained base model exhibits strong in-context learning capabilities and the instruction-tuned model demonstrates competitive performance among open-source LMMs with similar model sizes. In addition, we introduce a safety-tuned model with DPO, aiming to mitigate harmful behaviors such as hallucinations and improve safety. We open-source our models, curated large-scale datasets, and our fine-tuning codebase to facilitate further advancements in LMM research. Associated resources will be available on our project page above.

Summary

AI-Generated Summary

PDF1017November 26, 2024