ChatPaper.aiChatPaper

钢琴综合优化数据集(PianoCoRe):融合精炼的钢琴MIDI数据集

PianoCoRe: Combined and Refined Piano MIDI Dataset

May 7, 2026
作者: Ilya Borovik
cs.AI

摘要

拥有匹配乐谱与演奏的符号音乐数据集对众多音乐信息检索任务至关重要。然而现有资源往往存在作曲家覆盖范围有限、演奏版本单一、缺乏音符级对齐或命名格式不一致等问题。本研究推出PianoCoRe——一个通过整合优化主流开源钢琴数据库构建的大规模钢琴MIDI数据集。该数据集包含483位作曲家创作的5,625首曲目的250,046个演奏版本,总演奏时长达21,763小时。PianoCoRe采用分级子集发布以支持不同应用场景:从大规模分析与预训练(PianoCoRe-C及去重版PianoCoRe-B)到具备音符级乐谱对齐的演奏表现力建模(PianoCoRe-A/A*)。其中音符对齐子集PianoCoRe-A提供了迄今最大的开源对齐资源,包含1,591份乐谱与157,207个演奏版本的对应关系。除数据集外,本研究的贡献还包括:(1)用于检测损坏文件与类乐谱转录的MIDI质量分类器;(2)RAScoP对齐优化流程,可清理时序对齐错误并插值补全缺失音符。分析表明该优化流程能有效降低时序噪声并消除异常速度值。此外,基于PianoCoRe训练的演奏表现力渲染模型相较于原始或小型数据集训练的模型,对未见过曲目展现出更强的鲁棒性。PianoCoRe为新一代钢琴演奏表现力研究提供了即用型基础平台。
English
Symbolic music datasets with matched scores and performances are essential for many music information retrieval (MIR) tasks. Yet, existing resources often cover a narrow range of composers, lack performance variety, omit note-level alignments, or use inconsistent naming formats. This work presents PianoCoRe, a large-scale piano MIDI dataset that unifies and refines major open-source piano corpora. The dataset contains 250,046 performances of 5,625 pieces written by 483 composers, totaling 21,763 h of performed music. PianoCoRe is released in tiered subsets to support different applications: from large-scale analysis and pre-training (PianoCoRe-C and deduplicated PianoCoRe-B) to expressive performance modeling with note-level score alignment (PianoCoRe-A/A*). The note-aligned subset, PianoCoRe-A, provides the largest open-source collection of 157,207 performances aligned to 1,591 scores to date. In addition to the dataset, the contributions are: (1) a MIDI quality classifier for detecting corrupted and score-like transcriptions and (2) RAScoP, an alignment refinement pipeline that cleans temporal alignment errors and interpolates missing notes. The analysis shows that the refinement reduces temporal noise and eliminates tempo outliers. Moreover, an expressive performance rendering model trained on PianoCoRe demonstrates improved robustness to unseen pieces compared to models trained on raw or smaller datasets. PianoCoRe provides a ready-to-use foundation for the next generation of expressive piano performance research.
PDF11May 9, 2026