Seed1.5-VL技术报告
Seed1.5-VL Technical Report
May 11, 2025
作者: Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, Pengfei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng, Weiwei Liu, Wenqian Wang, Xianhan Zeng, Xiao Liu, Xiaobo Qin, Xiaohan Ding, Xiaojun Xiao, Xiaoying Zhang, Xuanwei Zhang, Xuehan Xiong, Yanghua Peng, Yangrui Chen, Yanwei Li, Yanxu Hu, Yi Lin, Yiyuan Hu, Yiyuan Zhang, Youbin Wu, Yu Li, Yudong Liu, Yue Ling, Yujia Qin, Zanbo Wang, Zhiwu He, Aoxue Zhang, Bairen Yi, Bencheng Liao, Can Huang, Can Zhang, Chaorui Deng, Chaoyi Deng, Cheng Lin, Cheng Yuan, Chenggang Li, Chenhui Gou, Chenwei Lou, Chengzhi Wei, Chundian Liu, Chunyuan Li, Deyao Zhu, Donghong Zhong, Feng Li, Feng Zhang, Gang Wu, Guodong Li, Guohong Xiao, Haibin Lin, Haihua Yang, Haoming Wang, Heng Ji, Hongxiang Hao, Hui Shen, Huixia Li, Jiahao Li, Jialong Wu, Jianhua Zhu, Jianpeng Jiao, Jiashi Feng, Jiaze Chen, Jianhui Duan, Jihao Liu, Jin Zeng, Jingqun Tang, Jingyu Sun, Joya Chen, Jun Long, Junda Feng, Junfeng Zhan, Junjie Fang, Junting Lu, Kai Hua, Kai Liu, Kai Shen, Kaiyuan Zhang, Ke Shen, Ke Wang, Keyu Pan, Kun Zhang, Kunchang Li, Lanxin Li, Lei Li, Lei Shi, Li Han, Liang Xiang, Liangqiang Chen, Lin Chen, Lin Li, Lin Yan, Liying Chi, Longxiang Liu, Mengfei Du, Mingxuan Wang, Ningxin Pan, Peibin Chen, Pengfei Chen, Pengfei Wu, Qingqing Yuan, Qingyao Shuai, Qiuyan Tao, Renjie Zheng, Renrui Zhang, Ru Zhang, Rui Wang, Rui Yang, Rui Zhao, Shaoqiang Xu, Shihao Liang, Shipeng Yan, Shu Zhong, Shuaishuai Cao, Shuangzhi Wu, Shufan Liu, Shuhan Chang, Songhua Cai, Tenglong Ao, Tianhao Yang, Tingting Zhang, Wanjun Zhong, Wei Jia, Wei Weng, Weihao Yu, Wenhao Huang, Wenjia Zhu, Wenli Yang, Wenzhi Wang, Xiang Long, XiangRui Yin, Xiao Li, Xiaolei Zhu, Xiaoying Jia, Xijin Zhang, Xin Liu, Xinchen Zhang, Xinyu Yang, Xiongcai Luo, Xiuli Chen, Xuantong Zhong, Xuefeng Xiao, Xujing Li, Yan Wu, Yawei Wen, Yifan Du, Yihao Zhang, Yining Ye, Yonghui Wu, Yu Liu, Yu Yue, Yufeng Zhou, Yufeng Yuan, Yuhang Xu, Yuhong Yang, Yun Zhang, Yunhao Fang, Yuntao Li, Yurui Ren, Yuwen Xiong, Zehua Hong, Zehua Wang, Zewei Sun, Zeyu Wang, Zhao Cai, Zhaoyue Zha, Zhecheng An, Zhehui Zhao, Zhengzhuo Xu, Zhipeng Chen, Zhiyong Wu, Zhuofan Zheng, Zihao Wang, Zilong Huang, Ziyu Zhu, Zuquan Song
cs.AI
摘要
我们推出Seed1.5-VL,这是一款旨在推动通用多模态理解与推理的视觉-语言基础模型。Seed1.5-VL由包含5.32亿参数的视觉编码器与一个拥有200亿活跃参数的专家混合(MoE)大语言模型构成。尽管其架构相对紧凑,该模型在广泛的公开视觉语言模型(VLM)基准测试及内部评估套件中均展现出强劲性能,在60项公开基准测试中的38项上达到了业界领先水平。特别是在以代理为中心的任务,如GUI控制与游戏玩法中,Seed1.5-VL超越了包括OpenAI CUA和Claude 3.7在内的顶尖多模态系统。除了视觉与视频理解外,它还展现出强大的推理能力,使其在视觉谜题等多模态推理挑战中尤为有效。我们相信这些能力将赋能更广泛的任务应用。本报告主要从模型设计、数据构建及不同阶段的训练等方面,全面回顾了我们构建Seed1.5-VL的经验,期望能激发进一步的研究。Seed1.5-VL现已可通过https://www.volcengine.com/(火山引擎模型ID:doubao-1-5-thinking-vision-pro-250428)访问。
English
We present Seed1.5-VL, a vision-language foundation model designed to advance
general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed
with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B
active parameters. Despite its relatively compact architecture, it delivers
strong performance across a wide spectrum of public VLM benchmarks and internal
evaluation suites, achieving the state-of-the-art performance on 38 out of 60
public benchmarks. Moreover, in agent-centric tasks such as GUI control and
gameplay, Seed1.5-VL outperforms leading multimodal systems, including OpenAI
CUA and Claude 3.7. Beyond visual and video understanding, it also demonstrates
strong reasoning abilities, making it particularly effective for multimodal
reasoning challenges such as visual puzzles. We believe these capabilities will
empower broader applications across diverse tasks. In this report, we mainly
provide a comprehensive review of our experiences in building Seed1.5-VL across
model design, data construction, and training at various stages, hoping that
this report can inspire further research. Seed1.5-VL is now accessible at
https://www.volcengine.com/ (Volcano Engine Model ID:
doubao-1-5-thinking-vision-pro-250428)Summary
AI-Generated Summary