Futga:通过时间增强生成增强实现细粒度音乐理解
Futga: Towards Fine-grained Music Understanding through Temporally-enhanced Generative Augmentation
July 29, 2024
作者: Junda Wu, Zachary Novack, Amit Namburi, Jiaheng Dai, Hao-Wen Dong, Zhouhang Xie, Carol Chen, Julian McAuley
cs.AI
摘要
现有的音乐字幕方法局限于生成简洁的全局描述短音乐片段,无法捕捉到音乐的细粒度特征和时域音乐变化。为了解决这些限制,我们提出了FUTGA,这是一个通过从生成增强学习细粒度音乐理解能力的模型,通过学习来自时间组合的生成增强。我们利用现有的音乐字幕数据集和大型语言模型(LLMs)来合成具有结构描述和时间边界的细粒度音乐字幕,适用于完整歌曲。通过提出的合成数据集增强,FUTGA 能够识别音乐在关键转换点的时间变化及其音乐功能,并为每个音乐片段生成详细描述。我们进一步引入了由 FUTGA 生成的完整音乐字幕数据集,作为 MusicCaps 和 Song Describer 数据集的增强。我们在几个下游任务上评估了自动生成的字幕,包括音乐生成和检索。实验表明了所生成字幕的质量以及所提出的音乐字幕方法在各种下游任务中取得的更好性能。我们的代码和数据集可以在 https://huggingface.co/JoshuaW1997/FUTGA 找到。
English
Existing music captioning methods are limited to generating concise global
descriptions of short music clips, which fail to capture fine-grained musical
characteristics and time-aware musical changes. To address these limitations,
we propose FUTGA, a model equipped with fined-grained music understanding
capabilities through learning from generative augmentation with temporal
compositions. We leverage existing music caption datasets and large language
models (LLMs) to synthesize fine-grained music captions with structural
descriptions and time boundaries for full-length songs. Augmented by the
proposed synthetic dataset, FUTGA is enabled to identify the music's temporal
changes at key transition points and their musical functions, as well as
generate detailed descriptions for each music segment. We further introduce a
full-length music caption dataset generated by FUTGA, as the augmentation of
the MusicCaps and the Song Describer datasets. We evaluate the automatically
generated captions on several downstream tasks, including music generation and
retrieval. The experiments demonstrate the quality of the generated captions
and the better performance in various downstream tasks achieved by the proposed
music captioning approach. Our code and datasets can be found in
https://huggingface.co/JoshuaW1997/FUTGA{blue{https://huggingface.co/JoshuaW1997/FUTGA}}.Summary
AI-Generated Summary