ChatPaper.aiChatPaper

MusiConGen:基於Transformer的文本轉音樂生成中的節奏和和弦控制

MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation

July 21, 2024
作者: Yun-Han Lan, Wen-Yi Hsiao, Hao-Chung Cheng, Yi-Hsuan Yang
cs.AI

摘要

現有的文本轉音樂模型能夠產生高質量且多樣化的音頻。然而,僅使用文本提示無法精確控制生成音樂的和弦和節奏等時間音樂特徵。為了應對這一挑戰,我們引入了 MusiConGen,這是一個基於 Transformer 的時間條件文本轉音樂模型,建立在預訓練的 MusicGen 框架之上。我們的創新在於一個針對消費級 GPU 定製的高效微調機制,該機制將自動提取的節奏和和弦整合為條件信號。在推斷過程中,條件可以是從參考音頻信號中提取的音樂特徵,也可以是用戶定義的符號和弦序列、BPM 和文本提示。我們對兩個數據集進行了性能評估,一個來自提取的特徵,另一個來自用戶創建的輸入,結果表明 MusiConGen 能夠生成與指定條件相符的逼真伴奏音樂。我們已將代碼和模型檢查點開源,並在線提供音頻示例,網址為 https://musicongen.github.io/musicongen_demo/。
English
Existing text-to-music models can produce high-quality audio with great diversity. However, textual prompts alone cannot precisely control temporal musical features such as chords and rhythm of the generated music. To address this challenge, we introduce MusiConGen, a temporally-conditioned Transformer-based text-to-music model that builds upon the pretrained MusicGen framework. Our innovation lies in an efficient finetuning mechanism, tailored for consumer-grade GPUs, that integrates automatically-extracted rhythm and chords as the condition signal. During inference, the condition can either be musical features extracted from a reference audio signal, or be user-defined symbolic chord sequence, BPM, and textual prompts. Our performance evaluation on two datasets -- one derived from extracted features and the other from user-created inputs -- demonstrates that MusiConGen can generate realistic backing track music that aligns well with the specified conditions. We open-source the code and model checkpoints, and provide audio examples online, https://musicongen.github.io/musicongen_demo/.

Summary

AI-Generated Summary

PDF92November 28, 2024