CultureMERT：跨文化音乐表征学习的持续预训练

摘要

近期音乐基础模型的进展提升了音频表征学习能力，然而其在多样音乐传统中的有效性仍显不足。为此，我们推出了CultureMERT-95M，一个多文化适应性的基础模型，旨在增强跨文化音乐表征学习与理解。为实现这一目标，我们提出了一种两阶段持续预训练策略，该策略结合了学习率重新升温与衰减机制，即便在有限计算资源下也能实现稳定适应。通过在包含希腊、土耳其及印度音乐传统的650小时多文化数据混合集上进行训练，该模型在多样非西方音乐自动标注任务中的ROC-AUC和AP平均提升了4.9%，超越了先前的最先进水平，同时在西式基准测试上几乎未出现遗忘现象。我们进一步探讨了任务算术法，这是一种在权重空间中融合单一文化适应模型的多文化适应替代方法。任务算术法在非西方自动标注任务上表现与我们的多文化训练模型相当，且在西式数据集上无性能退化。跨文化评估显示，单一文化模型在不同音乐传统间的迁移效果参差不齐，而多文化适应模型则实现了最佳整体性能。为支持世界音乐表征学习研究，我们公开了CultureMERT-95M与CultureMERT-TA-95M，以促进更具文化意识的音乐基础模型的发展。

English

Recent advances in music foundation models have improved audio representation learning, yet their effectiveness across diverse musical traditions remains limited. We introduce CultureMERT-95M, a multi-culturally adapted foundation model developed to enhance cross-cultural music representation learning and understanding. To achieve this, we propose a two-stage continual pre-training strategy that integrates learning rate re-warming and re-decaying, enabling stable adaptation even with limited computational resources. Training on a 650-hour multi-cultural data mix, comprising Greek, Turkish, and Indian music traditions, results in an average improvement of 4.9% in ROC-AUC and AP across diverse non-Western music auto-tagging tasks, surpassing prior state-of-the-art, with minimal forgetting on Western-centric benchmarks. We further investigate task arithmetic, an alternative approach to multi-cultural adaptation that merges single-culture adapted models in the weight space. Task arithmetic performs on par with our multi-culturally trained model on non-Western auto-tagging tasks and shows no regression on Western datasets. Cross-cultural evaluation reveals that single-culture models transfer with varying effectiveness across musical traditions, whereas the multi-culturally adapted model achieves the best overall performance. To support research on world music representation learning, we publicly release CultureMERT-95M and CultureMERT-TA-95M, fostering the development of more culturally aware music foundation models.

CultureMERT：跨文化音乐表征学习的持续预训练

CultureMERT: Continual Pre-Training for Cross-Cultural Music Representation Learning

摘要

Support