猛擊:在單張GPU上一天內訓練一個語音語言模型
Slamming: Training a Speech Language Model on One GPU in a Day
February 19, 2025
作者: Gallil Maimon, Avishai Elmakies, Yossi Adi
cs.AI
摘要
我們介紹了Slam,這是一種在單塊學術級GPU上24小時內訓練高質量語音語言模型(SLMs)的方案。我們通過對模型初始化與架構、合成訓練數據、基於合成數據的偏好優化以及所有其他組件的微調進行實證分析來實現這一目標。我們實證表明,這一訓練方案在更多計算資源下也能良好擴展,以遠低於領先SLMs的計算成本獲得相當的結果。我們希望這些洞見能使SLM訓練與研究更加普及。在SLM規模定律的背景下,我們的結果遠超預期的計算最優性能,為SLM的可行性提供了樂觀的展望。代碼、數據、模型及樣本請見:https://pages.cs.huji.ac.il/adiyoss-lab/slamming。
English
We introduce Slam, a recipe for training high-quality Speech Language Models
(SLMs) on a single academic GPU in 24 hours. We do so through empirical
analysis of model initialisation and architecture, synthetic training data,
preference optimisation with synthetic data and tweaking all other components.
We empirically demonstrate that this training recipe also scales well with more
compute getting results on par with leading SLMs in a fraction of the compute
cost. We hope these insights will make SLM training and research more
accessible. In the context of SLM scaling laws, our results far outperform
predicted compute optimal performance, giving an optimistic view to SLM
feasibility. See code, data, models, samples at -
https://pages.cs.huji.ac.il/adiyoss-lab/slamming .Summary
AI-Generated Summary