Cobra:将 Mamba 扩展为多模态大型语言模型,以实现高效推理
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
March 21, 2024
作者: Han Zhao, Min Zhang, Wei Zhao, Pengxiang Ding, Siteng Huang, Donglin Wang
cs.AI
摘要
近年来,多模态大型语言模型(MLLM)在各个领域的应用取得了显著成功。然而,作为许多下游任务的基础模型,当前的MLLM由著名的Transformer网络组成,其计算复杂度为二次,效率较低。为了提高这些基本模型的效率,我们提出了Cobra,一个具有线性计算复杂度的MLLM。具体而言,Cobra将高效的Mamba语言模型整合到视觉模态中。此外,我们探索并研究了各种模态融合方案,以创建一个有效的多模态Mamba。大量实验证明:(1)Cobra在速度上比当前计算效率高的最先进方法(如LLaVA-Phi、TinyLLaVA和MobileVLM v2)表现出极具竞争力的性能,因为Cobra具有线性顺序建模,速度更快。 (2)有趣的是,封闭式挑战性预测基准测试结果显示,Cobra在克服视觉错觉和空间关系判断方面表现出色。 (3)值得注意的是,Cobra甚至在参数数量约为LLaVA的43%的情况下,实现了与LLaVA可比较的性能。我们将使Cobra的所有代码开源,并希望所提出的方法能促进未来对MLLM中复杂性问题的研究。我们的项目页面位于:https://sites.google.com/view/cobravlm。
English
In recent years, the application of multimodal large language models (MLLM)
in various fields has achieved remarkable success. However, as the foundation
model for many downstream tasks, current MLLMs are composed of the well-known
Transformer network, which has a less efficient quadratic computation
complexity. To improve the efficiency of such basic models, we propose Cobra, a
linear computational complexity MLLM. Specifically, Cobra integrates the
efficient Mamba language model into the visual modality. Moreover, we explore
and study various modal fusion schemes to create an effective multi-modal
Mamba. Extensive experiments demonstrate that (1) Cobra achieves extremely
competitive performance with current computationally efficient state-of-the-art
methods, e.g., LLaVA-Phi, TinyLLaVA, and MobileVLM v2, and has faster
speed due to Cobra's linear sequential modeling. (2) Interestingly, the results
of closed-set challenging prediction benchmarks show that Cobra performs well
in overcoming visual illusions and spatial relationship judgments. (3) Notably,
Cobra even achieves comparable performance to LLaVA with about 43% of the
number of parameters. We will make all codes of Cobra open-source and hope that
the proposed method can facilitate future research on complexity problems in
MLLM. Our project page is available at: https://sites.google.com/view/cobravlm.Summary
AI-Generated Summary