プロンプトベース制御による楽曲生成のための汎用フレームワーク

要旨

楽曲生成は、様々なプロンプトに基づいて制御可能な高品質な楽曲を生成することに焦点を当てています。しかし、既存の手法では、プロンプトに基づいた制御と適切なアライメントを伴うボーカルと伴奏の生成に苦戦しています。さらに、多様なタスクをサポートする点でも不十分です。これらの課題に対処するため、我々はVersBandを提案します。これは、プロンプトに基づいた制御を伴う高品質でアライメントされた楽曲を合成するためのマルチタスク楽曲生成フレームワークです。VersBandは以下の主要なモデルで構成されています：1) VocalBandは、デカップリングされたモデルで、フローマッチング法を活用して歌唱スタイル、ピッチ、メルスペクトログラムを生成し、スタイル制御を伴う高速で高品質なボーカル生成を可能にします。2) AccompBandは、フローベースのトランスフォーマーモデルで、Band-MOEを組み込み、品質、アライメント、制御を向上させるために適切なエキスパートを選択します。このモデルは、ボーカルとアライメントされた制御可能な高品質な伴奏の生成を可能にします。3) 歌詞生成のためのLyricBandとメロディ生成のためのMelodyBandという2つの生成モデルが、多様なプロンプトに基づいた広範な制御を可能にする包括的なマルチタスク楽曲生成システムに貢献します。実験結果は、VersBandが客観的および主観的指標を用いた複数の楽曲生成タスクにおいて、ベースラインモデルを上回る性能を示すことを実証しています。音声サンプルはhttps://VersBand.github.ioでご覧いただけます。

English

Song generation focuses on producing controllable high-quality songs based on various prompts. However, existing methods struggle to generate vocals and accompaniments with prompt-based control and proper alignment. Additionally, they fall short in supporting various tasks. To address these challenges, we introduce VersBand, a multi-task song generation framework for synthesizing high-quality, aligned songs with prompt-based control. VersBand comprises these primary models: 1) VocalBand, a decoupled model, leverages the flow-matching method for generating singing styles, pitches, and mel-spectrograms, allowing fast, high-quality vocal generation with style control. 2) AccompBand, a flow-based transformer model, incorporates the Band-MOE, selecting suitable experts for enhanced quality, alignment, and control. This model allows for generating controllable, high-quality accompaniments aligned with vocals. 3) Two generation models, LyricBand for lyrics and MelodyBand for melodies, contribute to the comprehensive multi-task song generation system, allowing for extensive control based on multiple prompts. Experimental results demonstrate that VersBand performs better over baseline models across multiple song generation tasks using objective and subjective metrics. Audio samples are available at https://VersBand.github.io.