Hunyuan-Large: テンセントによる520億のアクティブパラメータを持つオープンソースのMoEモデル
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
November 4, 2024
著者: Xingwu Sun, Yanfeng Chen, Yiqing Huang, Ruobing Xie, Jiaqi Zhu, Kai Zhang, Shuaipeng Li, Zhen Yang, Jonny Han, Xiaobo Shu, Jiahao Bu, Zhongzhi Chen, Xuemeng Huang, Fengzong Lian, Saiyong Yang, Jianfeng Yan, Yuyuan Zeng, Xiaoqin Ren, Chao Yu, Lulu Wu, Yue Mao, Tao Yang, Suncong Zheng, Kan Wu, Dian Jiao, Jinbao Xue, Xipeng Zhang, Decheng Wu, Kai Liu, Dengpeng Wu, Guanghui Xu, Shaohua Chen, Shuang Chen, Xiao Feng, Yigeng Hong, Junqiang Zheng, Chengcheng Xu, Zongwei Li, Xiong Kuang, Jianglu Hu, Yiqi Chen, Yuchi Deng, Guiyang Li, Ao Liu, Chenchen Zhang, Shihui Hu, Zilong Zhao, Zifan Wu, Yao Ding, Weichao Wang, Han Liu, Roberts Wang, Hao Fei, Peijie She, Ze Zhao, Xun Cao, Hai Wang, Fusheng Xiang, Mengyuan Huang, Zhiyuan Xiong, Bin Hu, Xuebin Hou, Lei Jiang, Jiajia Wu, Yaping Deng, Yi Shen, Qian Wang, Weijie Liu, Jie Liu, Meng Chen, Liang Dong, Weiwen Jia, Hu Chen, Feifei Liu, Rui Yuan, Huilin Xu, Zhenxiang Yan, Tengfei Cao, Zhichao Hu, Xinhua Feng, Dong Du, Tinghao She, Yangyu Tao, Feng Zhang, Jianchen Zhu, Chengzhong Xu, Xirui Li, Chong Zha, Wen Ouyang, Yinben Xia, Xiang Li, Zekun He, Rongpeng Chen, Jiawei Song, Ruibin Chen, Fan Jiang, Chongqing Zhao, Bo Wang, Hao Gong, Rong Gan, Winston Hu, Zhanhui Kang, Yong Yang, Yuhong Liu, Di Wang, Jie Jiang
cs.AI
要旨
本論文では、現在最大のオープンソースTransformerベースの専門家モデルであるHunyuan-Largeを紹介します。総パラメータ数は3890億、活性化パラメータ数は520億で、最大256Kトークンを処理できます。Hunyuan-Largeの優れた性能を言語理解と生成、論理推論、数学的問題解決、コーディング、長文脈、および集約タスクを含むさまざまなベンチマークで徹底的に評価し、LLama3.1-70Bを上回り、はるかに大きなLLama3.1-405Bモデルと比較して同等の性能を発揮します。Hunyuan-Largeの主な実践には、従来の文献よりもはるかに大きい大規模な合成データ、混合専門家ルーティング戦略、キー値キャッシュ圧縮技術、および専門家固有の学習率戦略が含まれます。さらに、専門家モデルのスケーリング則と学習率スケジュールについても調査し、将来のモデル開発と最適化に貴重な知見と指針を提供します。Hunyuan-Largeのコードとチェックポイントは、将来のイノベーションと応用を促進するために公開されています。
コード: https://github.com/Tencent/Hunyuan-Large
モデル: https://huggingface.co/tencent/Tencent-Hunyuan-Large
English
In this paper, we introduce Hunyuan-Large, which is currently the largest
open-source Transformer-based mixture of experts model, with a total of 389
billion parameters and 52 billion activation parameters, capable of handling up
to 256K tokens. We conduct a thorough evaluation of Hunyuan-Large's superior
performance across various benchmarks including language understanding and
generation, logical reasoning, mathematical problem-solving, coding,
long-context, and aggregated tasks, where it outperforms LLama3.1-70B and
exhibits comparable performance when compared to the significantly larger
LLama3.1-405B model. Key practice of Hunyuan-Large include large-scale
synthetic data that is orders larger than in previous literature, a mixed
expert routing strategy, a key-value cache compression technique, and an
expert-specific learning rate strategy. Additionally, we also investigate the
scaling laws and learning rate schedule of mixture of experts models, providing
valuable insights and guidances for future model development and optimization.
The code and checkpoints of Hunyuan-Large are released to facilitate future
innovations and applications.
Codes: https://github.com/Tencent/Hunyuan-Large
Models: https://huggingface.co/tencent/Tencent-Hunyuan-LargeSummary
AI-Generated Summary