InternLM-XComposer-2.5:一個多功能的大視覺語言模型,支援長上下文輸入和輸出。
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
July 3, 2024
作者: Pan Zhang, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Rui Qian, Lin Chen, Qipeng Guo, Haodong Duan, Bin Wang, Linke Ouyang, Songyang Zhang, Wenwei Zhang, Yining Li, Yang Gao, Peng Sun, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Hang Yan, Conghui He, Xingcheng Zhang, Kai Chen, Jifeng Dai, Yu Qiao, Dahua Lin, Jiaqi Wang
cs.AI
摘要
我們介紹 InternLM-XComposer-2.5(IXC-2.5),這是一個多才多藝的大視野語言模型,支援長文本輸入和輸出。IXC-2.5在各種文本-圖像理解和合成應用中表現出色,僅使用 7B LLM 後端即實現了 GPT-4V 級別的能力。通過使用 24K 交錯的圖像-文本上下文進行訓練,它可以通過 RoPE 外推擴展到 96K 的長上下文。這種長上下文能力使 IXC-2.5 在需要廣泛輸入和輸出上下文的任務中表現卓越。與其之前的 2.0 版本相比,InternLM-XComposer-2.5 在視覺-語言理解方面有三個主要升級:(1)超高分辨率理解,(2)細粒度視頻理解,以及(3)多輪多圖像對話。除了理解,IXC-2.5 通過使用額外的 LoRA 參數擴展到兩個引人注目的應用,用於文本-圖像合成:(1)製作網頁和(2)撰寫高質量的文本-圖像文章。IXC-2.5 在 28 個基準測試上進行了評估,在 16 個基準測試中優於現有的開源最先進模型。它還在 16 個關鍵任務中超越或與 GPT-4V 和 Gemini Pro 競爭激烈。InternLM-XComposer-2.5 可在 https://github.com/InternLM/InternLM-XComposer 公開獲取。
English
We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision
language model that supports long-contextual input and output. IXC-2.5 excels
in various text-image comprehension and composition applications, achieving
GPT-4V level capabilities with merely 7B LLM backend. Trained with 24K
interleaved image-text contexts, it can seamlessly extend to 96K long contexts
via RoPE extrapolation. This long-context capability allows IXC-2.5 to excel in
tasks requiring extensive input and output contexts. Compared to its previous
2.0 version, InternLM-XComposer-2.5 features three major upgrades in
vision-language comprehension: (1) Ultra-High Resolution Understanding, (2)
Fine-Grained Video Understanding, and (3) Multi-Turn Multi-Image Dialogue. In
addition to comprehension, IXC-2.5 extends to two compelling applications using
extra LoRA parameters for text-image composition: (1) Crafting Webpages and (2)
Composing High-Quality Text-Image Articles. IXC-2.5 has been evaluated on 28
benchmarks, outperforming existing open-source state-of-the-art models on 16
benchmarks. It also surpasses or competes closely with GPT-4V and Gemini Pro on
16 key tasks. The InternLM-XComposer-2.5 is publicly available at
https://github.com/InternLM/InternLM-XComposer.Summary
AI-Generated Summary