ChatPaper.aiChatPaper

H2O-Danube3 技术报告

H2O-Danube3 Technical Report

July 12, 2024
作者: Pascal Pfeiffer, Philipp Singer, Yauhen Babakhin, Gabor Fodor, Nischay Dhankhar, Sri Satish Ambati
cs.AI

摘要

我们提出了H2O-Danube3,这是一系列小型语言模型,包括H2O-Danube3-4B,训练数据为6T tokens,以及H2O-Danube3-500M,训练数据为4T tokens。我们的模型在高质量Web数据上进行了预训练,主要包括英文 tokens,在最终监督调整为聊天版本之前,经过三个不同数据混合阶段。这些模型在多个学术、聊天和微调基准测试中表现出色。由于其紧凑的架构,H2O-Danube3 可以在现代智能手机上高效运行,实现本地推理和快速处理能力,甚至可以在移动设备上快速处理。我们在Apache 2.0许可下公开提供所有模型,进一步使更广泛的受众在经济上获得对大型语言模型的使用权。
English
We present H2O-Danube3, a series of small language models consisting of H2O-Danube3-4B, trained on 6T tokens and H2O-Danube3-500M, trained on 4T tokens. Our models are pre-trained on high quality Web data consisting of primarily English tokens in three stages with different data mixes before final supervised tuning for chat version. The models exhibit highly competitive metrics across a multitude of academic, chat, and fine-tuning benchmarks. Thanks to its compact architecture, H2O-Danube3 can be efficiently run on a modern smartphone, enabling local inference and rapid processing capabilities even on mobile devices. We make all models openly available under Apache 2.0 license further democratizing LLMs to a wider audience economically.

Summary

AI-Generated Summary

PDF202November 28, 2024