ChatPaper.aiChatPaper

MM-LLMs:多模態大型語言模型的最新進展

MM-LLMs: Recent Advances in MultiModal Large Language Models

January 24, 2024
作者: Duzhen Zhang, Yahan Yu, Chenxing Li, Jiahua Dong, Dan Su, Chenhui Chu, Dong Yu
cs.AI

摘要

在過去的一年中,多模式大型語言模型(MM-LLMs)取得了顯著進展,通過成本效益的訓練策略,擴展了現成的LLMs以支持多模式輸入或輸出。由此產生的模型不僅保留了LLMs固有的推理和決策能力,還賦予了多樣的多模式任務。在本文中,我們提供了一份全面的調查,旨在促進對MM-LLMs的進一步研究。具體而言,我們首先概述了模型架構和訓練流程的一般設計公式。隨後,我們簡要介紹了26種現有的MM-LLMs,每種都以其特定的公式特徵。此外,我們回顧了MM-LLMs在主流基準測試中的表現,並總結了關鍵的訓練配方,以增強MM-LLMs的效力。最後,我們探討了MM-LLMs的前景方向,同時維護一個實時追蹤網站,以追蹤該領域的最新發展。我們希望這份調查對於MM-LLMs領域的持續進步有所貢獻。
English
In the past year, MultiModal Large Language Models (MM-LLMs) have undergone substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs via cost-effective training strategies. The resulting models not only preserve the inherent reasoning and decision-making capabilities of LLMs but also empower a diverse range of MM tasks. In this paper, we provide a comprehensive survey aimed at facilitating further research of MM-LLMs. Specifically, we first outline general design formulations for model architecture and training pipeline. Subsequently, we provide brief introductions of 26 existing MM-LLMs, each characterized by its specific formulations. Additionally, we review the performance of MM-LLMs on mainstream benchmarks and summarize key training recipes to enhance the potency of MM-LLMs. Lastly, we explore promising directions for MM-LLMs while concurrently maintaining a real-time tracking website for the latest developments in the field. We hope that this survey contributes to the ongoing advancement of the MM-LLMs domain.
PDF495December 15, 2024