MobileVLM V2：視覺語言模型的更快更強基線

摘要

我們介紹了MobileVLM V2，這是在MobileVLM基礎上顯著改進的視覺語言模型系列，證明了對於移動VLM而言，新穎的架構設計、針對移動VLM量身定制的改進訓練方案，以及豐富高質量的數據集編輯可以顯著提升VLM的性能。具體來說，MobileVLM V2 1.7B在標準VLM基準測試中取得了比3B規模的更大VLM表現更好或相當的成績。值得注意的是，我們的3B模型在7B+規模上表現優於眾多VLM。我們的模型將在https://github.com/Meituan-AutoML/MobileVLM 上發布。

English

We introduce MobileVLM V2, a family of significantly improved vision language models upon MobileVLM, which proves that a delicate orchestration of novel architectural design, an improved training scheme tailored for mobile VLMs, and rich high-quality dataset curation can substantially benefit VLMs' performance. Specifically, MobileVLM V2 1.7B achieves better or on-par performance on standard VLM benchmarks compared with much larger VLMs at the 3B scale. Notably, our 3B model outperforms a large variety of VLMs at the 7B+ scale. Our models will be released at https://github.com/Meituan-AutoML/MobileVLM .

MobileVLM V2：視覺語言模型的更快更強基線

MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

摘要

Support