ChatPaper.aiChatPaper

InternLM-XComposer-2.5:一個多功能的大視覺語言模型,支援長上下文輸入和輸出。

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

July 3, 2024
作者: Pan Zhang, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Rui Qian, Lin Chen, Qipeng Guo, Haodong Duan, Bin Wang, Linke Ouyang, Songyang Zhang, Wenwei Zhang, Yining Li, Yang Gao, Peng Sun, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Hang Yan, Conghui He, Xingcheng Zhang, Kai Chen, Jifeng Dai, Yu Qiao, Dahua Lin, Jiaqi Wang
cs.AI

摘要

我們介紹 InternLM-XComposer-2.5(IXC-2.5),這是一個多才多藝的大視野語言模型,支援長文本輸入和輸出。IXC-2.5在各種文本-圖像理解和合成應用中表現出色,僅使用 7B LLM 後端即實現了 GPT-4V 級別的能力。通過使用 24K 交錯的圖像-文本上下文進行訓練,它可以通過 RoPE 外推擴展到 96K 的長上下文。這種長上下文能力使 IXC-2.5 在需要廣泛輸入和輸出上下文的任務中表現卓越。與其之前的 2.0 版本相比,InternLM-XComposer-2.5 在視覺-語言理解方面有三個主要升級:(1)超高分辨率理解,(2)細粒度視頻理解,以及(3)多輪多圖像對話。除了理解,IXC-2.5 通過使用額外的 LoRA 參數擴展到兩個引人注目的應用,用於文本-圖像合成:(1)製作網頁和(2)撰寫高質量的文本-圖像文章。IXC-2.5 在 28 個基準測試上進行了評估,在 16 個基準測試中優於現有的開源最先進模型。它還在 16 個關鍵任務中超越或與 GPT-4V 和 Gemini Pro 競爭激烈。InternLM-XComposer-2.5 可在 https://github.com/InternLM/InternLM-XComposer 公開獲取。
English
We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that supports long-contextual input and output. IXC-2.5 excels in various text-image comprehension and composition applications, achieving GPT-4V level capabilities with merely 7B LLM backend. Trained with 24K interleaved image-text contexts, it can seamlessly extend to 96K long contexts via RoPE extrapolation. This long-context capability allows IXC-2.5 to excel in tasks requiring extensive input and output contexts. Compared to its previous 2.0 version, InternLM-XComposer-2.5 features three major upgrades in vision-language comprehension: (1) Ultra-High Resolution Understanding, (2) Fine-Grained Video Understanding, and (3) Multi-Turn Multi-Image Dialogue. In addition to comprehension, IXC-2.5 extends to two compelling applications using extra LoRA parameters for text-image composition: (1) Crafting Webpages and (2) Composing High-Quality Text-Image Articles. IXC-2.5 has been evaluated on 28 benchmarks, outperforming existing open-source state-of-the-art models on 16 benchmarks. It also surpasses or competes closely with GPT-4V and Gemini Pro on 16 key tasks. The InternLM-XComposer-2.5 is publicly available at https://github.com/InternLM/InternLM-XComposer.

Summary

AI-Generated Summary

PDF965November 28, 2024