ChatPaper.aiChatPaper

MiniCPM4:終端設備上的超高效大型語言模型

MiniCPM4: Ultra-Efficient LLMs on End Devices

June 9, 2025
作者: MiniCPM Team, Chaojun Xiao, Yuxuan Li, Xu Han, Yuzhuo Bai, Jie Cai, Haotian Chen, Wentong Chen, Xin Cong, Ganqu Cui, Ning Ding, Shengdan Fan, Yewei Fang, Zixuan Fu, Wenyu Guan, Yitong Guan, Junshao Guo, Yufeng Han, Bingxiang He, Yuxiang Huang, Cunliang Kong, Qiuzuo Li, Siyuan Li, Wenhao Li, Yanghao Li, Yishan Li, Zhen Li, Dan Liu, Biyuan Lin, Yankai Lin, Xiang Long, Quanyu Lu, Yaxi Lu, Peiyan Luo, Hongya Lyu, Litu Ou, Yinxu Pan, Zekai Qu, Qundong Shi, Zijun Song, Jiayuan Su, Zhou Su, Ao Sun, Xianghui Sun, Peijun Tang, Fangzheng Wang, Feng Wang, Shuo Wang, Yudong Wang, Yesai Wu, Zhenyu Xiao, Jie Xie, Zihao Xie, Yukun Yan, Jiarui Yuan, Kaihuo Zhang, Lei Zhang, Linyue Zhang, Xueren Zhang, Yudi Zhang, Hengyu Zhao, Weilin Zhao, Weilun Zhao, Yuanqian Zhao, Zhi Zheng, Ge Zhou, Jie Zhou, Wei Zhou, Zihan Zhou, Zixuan Zhou, Zhiyuan Liu, Guoyang Zeng, Chao Jia, Dahai Li, Maosong Sun
cs.AI

摘要

本文介紹了MiniCPM4,這是一款專為終端設備設計的高效大型語言模型(LLM)。我們通過在四個關鍵維度上的系統創新來實現這一效率:模型架構、訓練數據、訓練算法和推理系統。具體而言,在模型架構方面,我們提出了InfLLM v2,這是一種可訓練的稀疏注意力機制,能夠加速長上下文處理的預填充和解碼階段。在訓練數據方面,我們提出了UltraClean,這是一種高效且準確的預訓練數據過濾和生成策略,以及UltraChat v2,這是一個全面的監督微調數據集。這些數據集使得僅使用8萬億訓練標記即可實現滿意的模型性能。在訓練算法方面,我們提出了ModelTunnel v2用於高效的預訓練策略搜索,並通過引入分塊式滾動來改進現有的後訓練方法,實現負載均衡的強化學習和數據高效的三元LLM,BitCPM。在推理系統方面,我們提出了CPM.cu,它整合了稀疏注意力、模型量化和推測性採樣,以實現高效的預填充和解碼。為了滿足多樣的設備端需求,MiniCPM4提供了兩個版本,分別具有0.5B和8B參數。充分的評估結果顯示,MiniCPM4在多個基準測試中優於相似規模的開源模型,突顯了其效率和有效性。值得注意的是,MiniCPM4-8B在處理長序列時相比Qwen3-8B展現出顯著的速度提升。通過進一步的適應,MiniCPM4成功驅動了多樣化的應用,包括可信的調查生成和模型上下文協議的工具使用,清晰地展示了其廣泛的可用性。
English
This paper introduces MiniCPM4, a highly efficient large language model (LLM) designed explicitly for end-side devices. We achieve this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems. Specifically, in terms of model architecture, we propose InfLLM v2, a trainable sparse attention mechanism that accelerates both prefilling and decoding phases for long-context processing. Regarding training data, we propose UltraClean, an efficient and accurate pre-training data filtering and generation strategy, and UltraChat v2, a comprehensive supervised fine-tuning dataset. These datasets enable satisfactory model performance to be achieved using just 8 trillion training tokens. Regarding training algorithms, we propose ModelTunnel v2 for efficient pre-training strategy search, and improve existing post-training methods by introducing chunk-wise rollout for load-balanced reinforcement learning and data-efficient tenary LLM, BitCPM. Regarding inference systems, we propose CPM.cu that integrates sparse attention, model quantization, and speculative sampling to achieve efficient prefilling and decoding. To meet diverse on-device requirements, MiniCPM4 is available in two versions, with 0.5B and 8B parameters, respectively. Sufficient evaluation results show that MiniCPM4 outperforms open-source models of similar size across multiple benchmarks, highlighting both its efficiency and effectiveness. Notably, MiniCPM4-8B demonstrates significant speed improvements over Qwen3-8B when processing long sequences. Through further adaptation, MiniCPM4 successfully powers diverse applications, including trustworthy survey generation and tool use with model context protocol, clearly showcasing its broad usability.
PDF722June 10, 2025