猛擊：在單張GPU上一天內訓練一個語音語言模型

摘要

我們介紹了Slam，這是一種在單塊學術級GPU上24小時內訓練高質量語音語言模型（SLMs）的方案。我們通過對模型初始化與架構、合成訓練數據、基於合成數據的偏好優化以及所有其他組件的微調進行實證分析來實現這一目標。我們實證表明，這一訓練方案在更多計算資源下也能良好擴展，以遠低於領先SLMs的計算成本獲得相當的結果。我們希望這些洞見能使SLM訓練與研究更加普及。在SLM規模定律的背景下，我們的結果遠超預期的計算最優性能，為SLM的可行性提供了樂觀的展望。代碼、數據、模型及樣本請見：https://pages.cs.huji.ac.il/adiyoss-lab/slamming。

English

We introduce Slam, a recipe for training high-quality Speech Language Models (SLMs) on a single academic GPU in 24 hours. We do so through empirical analysis of model initialisation and architecture, synthetic training data, preference optimisation with synthetic data and tweaking all other components. We empirically demonstrate that this training recipe also scales well with more compute getting results on par with leading SLMs in a fraction of the compute cost. We hope these insights will make SLM training and research more accessible. In the context of SLM scaling laws, our results far outperform predicted compute optimal performance, giving an optimistic view to SLM feasibility. See code, data, models, samples at - https://pages.cs.huji.ac.il/adiyoss-lab/slamming .

猛擊：在單張GPU上一天內訓練一個語音語言模型

Slamming: Training a Speech Language Model on One GPU in a Day

摘要

Support