WildScore: シンボリック音楽推論におけるMLLMの実環境ベンチマーク

要旨

近年のマルチモーダル大規模言語モデル（MLLMs）の進展は、様々な視覚-言語タスクにおいて印象的な能力を示してきた。しかし、マルチモーダルな記号音楽領域におけるそれらの推論能力は、ほとんど未開拓のままである。本研究では、WildScoreを紹介する。これは、実世界の楽譜を解釈し、複雑な音楽学的クエリに答えるMLLMsの能力を評価するために設計された、初の実環境マルチモーダル記号音楽推論・分析ベンチマークである。WildScoreの各インスタンスは、実際の音楽作品から収集され、本物のユーザー生成の質問と議論を伴い、実践的な音楽分析の複雑さを捉えている。体系的な評価を促進するため、高レベルおよび詳細な音楽学的オントロジーからなる体系的分類を提案する。さらに、複雑な音楽推論を多肢選択式質問応答として枠組み化し、MLLMsの記号音楽理解を制御可能かつスケーラブルに評価する。最先端のMLLMsをWildScoreで実証的にベンチマークした結果、視覚-記号推論における興味深いパターンが明らかになり、記号音楽推論と分析におけるMLLMsの有望な方向性と持続的な課題が浮き彫りになった。データセットとコードを公開する。

English

Recent advances in Multimodal Large Language Models (MLLMs) have demonstrated impressive capabilities across various vision-language tasks. However, their reasoning abilities in the multimodal symbolic music domain remain largely unexplored. We introduce WildScore, the first in-the-wild multimodal symbolic music reasoning and analysis benchmark, designed to evaluate MLLMs' capacity to interpret real-world music scores and answer complex musicological queries. Each instance in WildScore is sourced from genuine musical compositions and accompanied by authentic user-generated questions and discussions, capturing the intricacies of practical music analysis. To facilitate systematic evaluation, we propose a systematic taxonomy, comprising both high-level and fine-grained musicological ontologies. Furthermore, we frame complex music reasoning as multiple-choice question answering, enabling controlled and scalable assessment of MLLMs' symbolic music understanding. Empirical benchmarking of state-of-the-art MLLMs on WildScore reveals intriguing patterns in their visual-symbolic reasoning, uncovering both promising directions and persistent challenges for MLLMs in symbolic music reasoning and analysis. We release the dataset and code.

WildScore: シンボリック音楽推論におけるMLLMの実環境ベンチマーク

WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music Reasoning

要旨

Support