和弦符号时间序列适应能在多大程度上承载体裁身份？——多体裁和弦符号建模中的能力与边界

摘要

和谐是一种紧凑的符号层，数学音高关系、声学协和与音乐惯例在此交汇。本报告将和弦符号序列视为一种可解释、可控制的时间序列，用于流派局部的和声建模，而非音乐的完整表征。从冻结的流行爵士音乐Transformer检查点出发，我评估了微小的适配接口能在多大程度上将该模型扩展到十一个目标流派：布鲁斯、波萨诺瓦、巴赫众赞歌、乡村、电子、民谣、放克、福音、嘻哈、节奏蓝调/灵魂乐和摇滚。主要评估比较了LoRA、IA3、BitFit、前缀微调与全参数微调在11个流派和3个随机种子下的表现，构成完整的165单元格网格。所有五种方法在保留和弦预测上均优于冻结基线，宏观增益在+2.89至+3.61个点之间；LoRA和IA3得分最高，但经Holm和Benjamini-Hochberg校正的Wilcoxon检验并未支持决定性优胜者。一项匹配数据量的控制实验进一步凸显了这一点：当流派被子采样至共同语料库大小时，IA3仍保持领先，但LoRA在全数据下的优势消失，降至末位，表明其微小差距部分源于数据驱动。控制令牌基线同样强劲，且错误流派适配器往往优于冻结基线，这表明大部分效果源自对可复用和声基底的轻量级条件化，而非某一特定适配器家族。额外的诊断分析（秩扫描、错误流派旋转、基线检查点消融、仅和弦流派分类、生成输出统计、真实歌曲评估和重复性分析）支持一个有限的结论：和弦符号适配可靠地改进了流派局部的和声预测，但仅凭和弦符号本身无法承载完整的流派身份。因此，本报告避免涉及感知流派真实性或完整音乐质量的断言，后者需要受控的听众或音乐家评估。

English

Harmony is a compact symbolic layer where mathematical pitch relations, acoustic consonance, and musical convention meet. This report treats chord-symbol sequences not as a complete representation of music, but as an interpretable, controllable time series for genre-local harmonic modeling. Starting from a frozen pop-jazz Music Transformer checkpoint, I evaluate how far small adaptation interfaces can extend the model to eleven target genres: blues, bossa nova, Bach chorales, country, electronic, folk, funk, gospel, hip-hop, R&B/soul, and rock. The main evaluation compares LoRA, IA3, BitFit, prefix tuning, and full fine-tuning over 11 genres and 3 seeds, a complete 165-cell grid. All five methods improve over the frozen base on held-out chord prediction, with macro gains from +2.89 to +3.61 points; LoRA and IA3 score highest, but Wilcoxon tests with Holm and Benjamini-Hochberg correction do not support a decisive winner. A matched-data-size control sharpens this: when genres are sub-sampled to a common corpus size, IA3 stays on top but LoRA's full-data edge disappears and it falls to last, indicating the small gaps are partly data-driven. A control-token baseline is also strong, and wrong-genre adapters often beat the frozen base, suggesting much of the effect comes from lightweight conditioning over a reusable harmonic base rather than one particular adapter family. Additional diagnostics (rank sweeps, wrong-genre rotation, a base-checkpoint ablation, chord-only genre classification, generated-output statistics, real-song evaluation, and duplicate analysis) support a bounded conclusion: chord-symbol adaptation reliably improves genre-local harmonic prediction, but chord symbols alone do not carry complete genre identity. The report therefore avoids claims about perceived genre authenticity or full musical quality, which require controlled listener or musician evaluation.