InternLM-XComposer-2.5：一個多功能的大視覺語言模型，支援長上下文輸入和輸出。

摘要

我們介紹 InternLM-XComposer-2.5（IXC-2.5），這是一個多才多藝的大視野語言模型，支援長文本輸入和輸出。IXC-2.5在各種文本-圖像理解和合成應用中表現出色，僅使用 7B LLM 後端即實現了 GPT-4V 級別的能力。通過使用 24K 交錯的圖像-文本上下文進行訓練，它可以通過 RoPE 外推擴展到 96K 的長上下文。這種長上下文能力使 IXC-2.5 在需要廣泛輸入和輸出上下文的任務中表現卓越。與其之前的 2.0 版本相比，InternLM-XComposer-2.5 在視覺-語言理解方面有三個主要升級：（1）超高分辨率理解，（2）細粒度視頻理解，以及（3）多輪多圖像對話。除了理解，IXC-2.5 通過使用額外的 LoRA 參數擴展到兩個引人注目的應用，用於文本-圖像合成：（1）製作網頁和（2）撰寫高質量的文本-圖像文章。IXC-2.5 在 28 個基準測試上進行了評估，在 16 個基準測試中優於現有的開源最先進模型。它還在 16 個關鍵任務中超越或與 GPT-4V 和 Gemini Pro 競爭激烈。InternLM-XComposer-2.5 可在 https://github.com/InternLM/InternLM-XComposer 公開獲取。

English

We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that supports long-contextual input and output. IXC-2.5 excels in various text-image comprehension and composition applications, achieving GPT-4V level capabilities with merely 7B LLM backend. Trained with 24K interleaved image-text contexts, it can seamlessly extend to 96K long contexts via RoPE extrapolation. This long-context capability allows IXC-2.5 to excel in tasks requiring extensive input and output contexts. Compared to its previous 2.0 version, InternLM-XComposer-2.5 features three major upgrades in vision-language comprehension: (1) Ultra-High Resolution Understanding, (2) Fine-Grained Video Understanding, and (3) Multi-Turn Multi-Image Dialogue. In addition to comprehension, IXC-2.5 extends to two compelling applications using extra LoRA parameters for text-image composition: (1) Crafting Webpages and (2) Composing High-Quality Text-Image Articles. IXC-2.5 has been evaluated on 28 benchmarks, outperforming existing open-source state-of-the-art models on 16 benchmarks. It also surpasses or competes closely with GPT-4V and Gemini Pro on 16 key tasks. The InternLM-XComposer-2.5 is publicly available at https://github.com/InternLM/InternLM-XComposer.

InternLM-XComposer-2.5：一個多功能的大視覺語言模型，支援長上下文輸入和輸出。

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

摘要

Support