ChatPaper.aiChatPaper

H2O-Danube3 技術報告

H2O-Danube3 Technical Report

July 12, 2024
作者: Pascal Pfeiffer, Philipp Singer, Yauhen Babakhin, Gabor Fodor, Nischay Dhankhar, Sri Satish Ambati
cs.AI

摘要

我們介紹了H2O-Danube3,這是一系列由H2O-Danube3-4B和H2O-Danube3-500M組成的小型語言模型,分別訓練於6T和4T tokens。我們的模型在高質量Web數據上進行預訓練,主要包含英語tokens,經過三個不同數據混合的階段後進行最終監督微調以用於聊天版本。這些模型在眾多學術、聊天和微調基準測試中展現出高競爭力的指標。由於其緊湊的架構,H2O-Danube3能夠高效運行於現代智能手機上,實現本地推論和快速處理能力,即使在移動設備上也能實現。我們將所有模型公開提供,採用Apache 2.0許可證,進一步使更廣泛的受眾在經濟上能夠使用LLMs。
English
We present H2O-Danube3, a series of small language models consisting of H2O-Danube3-4B, trained on 6T tokens and H2O-Danube3-500M, trained on 4T tokens. Our models are pre-trained on high quality Web data consisting of primarily English tokens in three stages with different data mixes before final supervised tuning for chat version. The models exhibit highly competitive metrics across a multitude of academic, chat, and fine-tuning benchmarks. Thanks to its compact architecture, H2O-Danube3 can be efficiently run on a modern smartphone, enabling local inference and rapid processing capabilities even on mobile devices. We make all models openly available under Apache 2.0 license further democratizing LLMs to a wider audience economically.

Summary

AI-Generated Summary

PDF202November 28, 2024