ChatPaper.aiChatPaper

MM-LLMs:多模态大型语言模型的最新进展

MM-LLMs: Recent Advances in MultiModal Large Language Models

January 24, 2024
作者: Duzhen Zhang, Yahan Yu, Chenxing Li, Jiahua Dong, Dan Su, Chenhui Chu, Dong Yu
cs.AI

摘要

在过去的一年中,多模态大型语言模型(MM-LLMs)取得了实质性进展,通过成本效益的训练策略,扩展了现成的LLMs以支持多模态输入或输出。由此产生的模型不仅保留了LLMs固有的推理和决策能力,还赋予了多样的多模态任务。在本文中,我们提供了一份全面的调查,旨在促进对MM-LLMs的进一步研究。具体而言,我们首先概述了模型架构和训练流程的一般设计公式。随后,我们简要介绍了26个现有的MM-LLMs,每个都以其特定的公式为特征。此外,我们回顾了MM-LLMs在主流基准测试上的表现,并总结了关键的训练配方,以增强MM-LLMs的效力。最后,我们探讨了MM-LLMs的有前途的方向,同时还维护一个实时跟踪网站,以追踪该领域的最新发展。我们希望这份调查有助于推动MM-LLMs领域的持续发展。
English
In the past year, MultiModal Large Language Models (MM-LLMs) have undergone substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs via cost-effective training strategies. The resulting models not only preserve the inherent reasoning and decision-making capabilities of LLMs but also empower a diverse range of MM tasks. In this paper, we provide a comprehensive survey aimed at facilitating further research of MM-LLMs. Specifically, we first outline general design formulations for model architecture and training pipeline. Subsequently, we provide brief introductions of 26 existing MM-LLMs, each characterized by its specific formulations. Additionally, we review the performance of MM-LLMs on mainstream benchmarks and summarize key training recipes to enhance the potency of MM-LLMs. Lastly, we explore promising directions for MM-LLMs while concurrently maintaining a real-time tracking website for the latest developments in the field. We hope that this survey contributes to the ongoing advancement of the MM-LLMs domain.
PDF495December 15, 2024