ChatPaper.aiChatPaper

PianoCoRe:组合优化版钢琴MIDI数据集

PianoCoRe: Combined and Refined Piano MIDI Dataset

May 7, 2026
作者: Ilya Borovik
cs.AI

摘要

具備樂譜與演奏匹配的符號音樂數據集對諸多音樂信息檢索任務至關重要。然而現有資源往往存在作曲家覆蓋面窄、演奏版本單一、缺乏音符級對齊或命名格式不統一等問題。本研究推出大規模鋼琴MIDI數據集PianoCoRe,該數據集整合並優化了主流開源鋼琴數據庫,包含483位作曲家創作的5,625首樂曲的250,046個演奏版本,總時長達21,763小時。PianoCoRe採用分層發佈模式以支持不同應用場景:從大規模分析與預訓練(PianoCoRe-C及去重版PianoCoRe-B)到具備音符級樂譜對齊的表現力演奏建模(PianoCoRe-A/A*)。其中音符對齊子集PianoCoRe-A提供了迄今最大的開源對齊數據,包含1,591份樂譜與157,207個演奏版本的對應關係。除數據集外,本研究的貢獻還包括:(1)用於檢測損壞文件與類樂譜轉錄的MIDI質量分類器;(2)RAScoP對齊優化流程,可清理時序對齊錯誤並插值補全缺失音符。分析表明該優化流程能有效降低時序噪聲並消除異常速度值。此外,基於PianoCoRe訓練的表現力演奏生成模型,相較於使用原始或小型數據集訓練的模型,對未見樂曲的處理魯棒性顯著提升。PianoCoRe為新一代鋼琴表現力演奏研究提供了即用型基礎平台。
English
Symbolic music datasets with matched scores and performances are essential for many music information retrieval (MIR) tasks. Yet, existing resources often cover a narrow range of composers, lack performance variety, omit note-level alignments, or use inconsistent naming formats. This work presents PianoCoRe, a large-scale piano MIDI dataset that unifies and refines major open-source piano corpora. The dataset contains 250,046 performances of 5,625 pieces written by 483 composers, totaling 21,763 h of performed music. PianoCoRe is released in tiered subsets to support different applications: from large-scale analysis and pre-training (PianoCoRe-C and deduplicated PianoCoRe-B) to expressive performance modeling with note-level score alignment (PianoCoRe-A/A*). The note-aligned subset, PianoCoRe-A, provides the largest open-source collection of 157,207 performances aligned to 1,591 scores to date. In addition to the dataset, the contributions are: (1) a MIDI quality classifier for detecting corrupted and score-like transcriptions and (2) RAScoP, an alignment refinement pipeline that cleans temporal alignment errors and interpolates missing notes. The analysis shows that the refinement reduces temporal noise and eliminates tempo outliers. Moreover, an expressive performance rendering model trained on PianoCoRe demonstrates improved robustness to unseen pieces compared to models trained on raw or smaller datasets. PianoCoRe provides a ready-to-use foundation for the next generation of expressive piano performance research.
PDF11May 9, 2026