スラミング：1日で1GPUを用いて音声言語モデルを訓練する

要旨

我々は、単一の学術用GPUで24時間以内に高品質な音声言語モデル（SLM）を訓練するためのレシピ「Slam」を紹介する。モデルの初期化とアーキテクチャ、合成トレーニングデータ、合成データを用いた選好最適化、およびその他のコンポーネントの微調整を通じてこれを実現した。我々は、この訓練レシピがより多くの計算リソースを用いた場合にも良好にスケールし、主要なSLMと同等の結果を計算コストの一部で得られることを実証的に示す。これらの知見が、SLMの訓練と研究をよりアクセスしやすいものにすることを期待している。SLMのスケーリング則の文脈において、我々の結果は計算最適性能の予測を大幅に上回り、SLMの実現可能性に対して楽観的な見方を提供する。コード、データ、モデル、サンプルは以下を参照：https://pages.cs.huji.ac.il/adiyoss-lab/slamming。

English

We introduce Slam, a recipe for training high-quality Speech Language Models (SLMs) on a single academic GPU in 24 hours. We do so through empirical analysis of model initialisation and architecture, synthetic training data, preference optimisation with synthetic data and tweaking all other components. We empirically demonstrate that this training recipe also scales well with more compute getting results on par with leading SLMs in a fraction of the compute cost. We hope these insights will make SLM training and research more accessible. In the context of SLM scaling laws, our results far outperform predicted compute optimal performance, giving an optimistic view to SLM feasibility. See code, data, models, samples at - https://pages.cs.huji.ac.il/adiyoss-lab/slamming .

スラミング：1日で1GPUを用いて音声言語モデルを訓練する

Slamming: Training a Speech Language Model on One GPU in a Day

要旨

Support