重要:針對行動裝置的高效能大型多模型模型
Imp: Highly Capable Large Multimodal Models for Mobile Devices
May 20, 2024
作者: Zhenwei Shao, Zhou Yu, Jun Yu, Xuecheng Ouyang, Lihao Zheng, Zhenbiao Gai, Mingyang Wang, Jiajun Ding
cs.AI
摘要
透過運用大型語言模型(LLMs)的能力,最近的大型多模型模型(LMMs)展現了在開放世界多模式理解中的卓越多樣性。然而,它們通常具有大量參數和高計算密集度,因此限制了它們在資源受限情況下的應用。為此,已經連續提出了幾種輕量級LMMs,以在受限規模(例如3B)下最大化能力。儘管這些方法取得了令人鼓舞的成果,但大多數方法僅專注於設計空間的一兩個方面,並且影響模型能力的關鍵設計選擇尚未得到深入研究。在本文中,我們從模型架構、訓練策略和訓練數據等方面對輕量級LMMs進行系統研究。根據我們的研究結果,我們獲得了Imp - 一系列在2B-4B規模下非常有能力的LMMs。值得注意的是,我們的Imp-3B模型穩定地優於所有現有相同大小的輕量級LMMs,甚至超越了13B規模的最新技術。通過低位量化和分辨率降低技術,我們的Imp模型可以部署在高通驍龍8Gen3移動芯片上,推理速度約為每秒13個標記。
English
By harnessing the capabilities of large language models (LLMs), recent large
multimodal models (LMMs) have shown remarkable versatility in open-world
multimodal understanding. Nevertheless, they are usually parameter-heavy and
computation-intensive, thus hindering their applicability in
resource-constrained scenarios. To this end, several lightweight LMMs have been
proposed successively to maximize the capabilities under constrained scale
(e.g., 3B). Despite the encouraging results achieved by these methods, most of
them only focus on one or two aspects of the design space, and the key design
choices that influence model capability have not yet been thoroughly
investigated. In this paper, we conduct a systematic study for lightweight LMMs
from the aspects of model architecture, training strategy, and training data.
Based on our findings, we obtain Imp -- a family of highly capable LMMs at the
2B-4B scales. Notably, our Imp-3B model steadily outperforms all the existing
lightweight LMMs of similar size, and even surpasses the state-of-the-art LMMs
at the 13B scale. With low-bit quantization and resolution reduction
techniques, our Imp model can be deployed on a Qualcomm Snapdragon 8Gen3 mobile
chip with a high inference speed of about 13 tokens/s.Summary
AI-Generated Summary