InternLM-XComposer-2.5:一款多才多艺的大视觉语言模型,支持长上下文输入和输出。
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
July 3, 2024
作者: Pan Zhang, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Rui Qian, Lin Chen, Qipeng Guo, Haodong Duan, Bin Wang, Linke Ouyang, Songyang Zhang, Wenwei Zhang, Yining Li, Yang Gao, Peng Sun, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Hang Yan, Conghui He, Xingcheng Zhang, Kai Chen, Jifeng Dai, Yu Qiao, Dahua Lin, Jiaqi Wang
cs.AI
摘要
我们介绍InternLM-XComposer-2.5(IXC-2.5),这是一个多才多艺的大视觉语言模型,支持长上下文输入和输出。IXC-2.5在各种文本图像理解和合成应用中表现出色,仅使用7B LLM后端即可达到GPT-4V级别的能力。通过使用24K交错的图像文本上下文进行训练,它可以通过RoPE外推轻松扩展到96K的长上下文。这种长上下文能力使IXC-2.5在需要广泛输入和输出上下文的任务中表现卓越。与其之前的2.0版本相比,InternLM-XComposer-2.5在视觉语言理解方面有三个主要升级:(1)超高分辨率理解,(2)细粒度视频理解,以及(3)多轮多图像对话。除了理解,IXC-2.5还通过额外的LoRA参数扩展到两个引人注目的应用领域,用于文本图像合成:(1)制作网页和(2)撰写高质量的文本图像文章。IXC-2.5已在28个基准测试上进行评估,在16个基准测试上优于现有的开源最先进模型。它还在16个关键任务上超越或与GPT-4V和Gemini Pro竞争激烈。InternLM-XComposer-2.5可在https://github.com/InternLM/InternLM-XComposer 上公开获取。
English
We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision
language model that supports long-contextual input and output. IXC-2.5 excels
in various text-image comprehension and composition applications, achieving
GPT-4V level capabilities with merely 7B LLM backend. Trained with 24K
interleaved image-text contexts, it can seamlessly extend to 96K long contexts
via RoPE extrapolation. This long-context capability allows IXC-2.5 to excel in
tasks requiring extensive input and output contexts. Compared to its previous
2.0 version, InternLM-XComposer-2.5 features three major upgrades in
vision-language comprehension: (1) Ultra-High Resolution Understanding, (2)
Fine-Grained Video Understanding, and (3) Multi-Turn Multi-Image Dialogue. In
addition to comprehension, IXC-2.5 extends to two compelling applications using
extra LoRA parameters for text-image composition: (1) Crafting Webpages and (2)
Composing High-Quality Text-Image Articles. IXC-2.5 has been evaluated on 28
benchmarks, outperforming existing open-source state-of-the-art models on 16
benchmarks. It also surpasses or competes closely with GPT-4V and Gemini Pro on
16 key tasks. The InternLM-XComposer-2.5 is publicly available at
https://github.com/InternLM/InternLM-XComposer.Summary
AI-Generated Summary