Futga:通過時間增強生成增強實現音樂細粒度理解
Futga: Towards Fine-grained Music Understanding through Temporally-enhanced Generative Augmentation
July 29, 2024
作者: Junda Wu, Zachary Novack, Amit Namburi, Jiaheng Dai, Hao-Wen Dong, Zhouhang Xie, Carol Chen, Julian McAuley
cs.AI
摘要
現有的音樂標註方法僅能生成簡潔的全局描述短音樂片段,無法捕捉細緻的音樂特徵和時序感知的音樂變化。為解決這些限制,我們提出了 FUTGA 模型,通過從時間組成的生成增強中學習細緻的音樂理解能力。我們利用現有的音樂標註數據集和大型語言模型(LLMs)合成具有結構描述和時間界限的完整歌曲的細緻音樂標註。在所提出的合成數據集的增強下,FUTGA 能夠識別音樂在關鍵轉折點的時間變化及其音樂功能,並為每個音樂片段生成詳細描述。我們進一步介紹了由 FUTGA 生成的完整音樂標註數據集,作為 MusicCaps 和 Song Describer 數據集的擴充。我們在幾個下游任務上評估了自動生成的標註,包括音樂生成和檢索。實驗證明了所生成標註的質量,以及所提出的音樂標註方法在各種下游任務中取得的更好性能。我們的程式碼和數據集可在以下網址找到:https://huggingface.co/JoshuaW1997/FUTGA。
English
Existing music captioning methods are limited to generating concise global
descriptions of short music clips, which fail to capture fine-grained musical
characteristics and time-aware musical changes. To address these limitations,
we propose FUTGA, a model equipped with fined-grained music understanding
capabilities through learning from generative augmentation with temporal
compositions. We leverage existing music caption datasets and large language
models (LLMs) to synthesize fine-grained music captions with structural
descriptions and time boundaries for full-length songs. Augmented by the
proposed synthetic dataset, FUTGA is enabled to identify the music's temporal
changes at key transition points and their musical functions, as well as
generate detailed descriptions for each music segment. We further introduce a
full-length music caption dataset generated by FUTGA, as the augmentation of
the MusicCaps and the Song Describer datasets. We evaluate the automatically
generated captions on several downstream tasks, including music generation and
retrieval. The experiments demonstrate the quality of the generated captions
and the better performance in various downstream tasks achieved by the proposed
music captioning approach. Our code and datasets can be found in
https://huggingface.co/JoshuaW1997/FUTGA{blue{https://huggingface.co/JoshuaW1997/FUTGA}}.Summary
AI-Generated Summary