ChatPaper.aiChatPaper

Cobra:將 Mamba 擴展為多模態大型語言模型以提高推論效率

Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference

March 21, 2024
作者: Han Zhao, Min Zhang, Wei Zhao, Pengxiang Ding, Siteng Huang, Donglin Wang
cs.AI

摘要

近年來,多模態大型語言模型(MLLM)在各個領域的應用取得了顯著的成功。然而,作為許多下游任務的基礎模型,目前的MLLM由眾所周知的Transformer網絡組成,具有較低效的二次計算複雜度。為了提高這些基本模型的效率,我們提出了Cobra,一種具有線性計算複雜度的MLLM。具體而言,Cobra將高效的Mamba語言模型整合到視覺模態中。此外,我們探索並研究各種模態融合方案,以創建一個有效的多模態Mamba。大量實驗表明:(1)Cobra在速度上比目前計算效率高的最先進方法(例如LLaVA-Phi、TinyLLaVA和MobileVLM v2)表現出極具競爭力的性能,並且由於Cobra的線性順序建模,速度更快。 (2)有趣的是,封閉式具有挑戰性的預測基準測試結果顯示,Cobra在克服視覺錯覚和空間關係判斷方面表現出色。 (3)值得注意的是,Cobra甚至在參數數量約為LLaVA的43%的情況下實現了與LLaVA可比的性能。我們將使Cobra的所有代碼開源,並希望所提出的方法能促進MLLM中複雜問題的未來研究。我們的項目頁面位於:https://sites.google.com/view/cobravlm。
English
In recent years, the application of multimodal large language models (MLLM) in various fields has achieved remarkable success. However, as the foundation model for many downstream tasks, current MLLMs are composed of the well-known Transformer network, which has a less efficient quadratic computation complexity. To improve the efficiency of such basic models, we propose Cobra, a linear computational complexity MLLM. Specifically, Cobra integrates the efficient Mamba language model into the visual modality. Moreover, we explore and study various modal fusion schemes to create an effective multi-modal Mamba. Extensive experiments demonstrate that (1) Cobra achieves extremely competitive performance with current computationally efficient state-of-the-art methods, e.g., LLaVA-Phi, TinyLLaVA, and MobileVLM v2, and has faster speed due to Cobra's linear sequential modeling. (2) Interestingly, the results of closed-set challenging prediction benchmarks show that Cobra performs well in overcoming visual illusions and spatial relationship judgments. (3) Notably, Cobra even achieves comparable performance to LLaVA with about 43% of the number of parameters. We will make all codes of Cobra open-source and hope that the proposed method can facilitate future research on complexity problems in MLLM. Our project page is available at: https://sites.google.com/view/cobravlm.
PDF362December 15, 2024