슬래밍: 단일 GPU에서 하루 만에 음성 언어 모델 학습하기

초록

우리는 단일 학술용 GPU에서 24시간 만에 고품질 음성 언어 모델(Speech Language Models, SLMs)을 훈련시키는 방법론인 Slam을 소개한다. 이를 위해 모델 초기화와 아키텍처, 합성 훈련 데이터, 합성 데이터를 활용한 선호도 최적화, 그리고 기타 모든 구성 요소를 세밀히 조정하는 실증적 분석을 수행하였다. 우리는 이 훈련 방법론이 더 많은 컴퓨팅 자원과 함께 잘 확장되며, 선도적인 SLM들과 동등한 결과를 훨씬 적은 컴퓨팅 비용으로 달성할 수 있음을 실증적으로 입증하였다. 이러한 통찰이 SLM 훈련과 연구를 더욱 접근 가능하게 만들기를 기대한다. SLM 스케일링 법칙의 맥락에서, 우리의 결과는 예측된 컴퓨팅 최적 성능을 훨씬 뛰어넘어 SLM의 실현 가능성에 대해 낙관적인 전망을 제시한다. 코드, 데이터, 모델, 샘플은 https://pages.cs.huji.ac.il/adiyoss-lab/slamming에서 확인할 수 있다.

English

We introduce Slam, a recipe for training high-quality Speech Language Models (SLMs) on a single academic GPU in 24 hours. We do so through empirical analysis of model initialisation and architecture, synthetic training data, preference optimisation with synthetic data and tweaking all other components. We empirically demonstrate that this training recipe also scales well with more compute getting results on par with leading SLMs in a fraction of the compute cost. We hope these insights will make SLM training and research more accessible. In the context of SLM scaling laws, our results far outperform predicted compute optimal performance, giving an optimistic view to SLM feasibility. See code, data, models, samples at - https://pages.cs.huji.ac.il/adiyoss-lab/slamming .

슬래밍: 단일 GPU에서 하루 만에 음성 언어 모델 학습하기

Slamming: Training a Speech Language Model on One GPU in a Day

초록

Support