ChatPaper.aiChatPaper

和弦符號時序適應能承載多少曲風特徵?多曲風和弦符號建模的能力與界限

How Far Can Chord-Symbol Time-Series Adaptation Carry Genre Identity? Capabilities and Boundaries in Multi-Genre Chord-Symbol Modeling

June 5, 2026
作者: Jinju Lee
cs.AI

摘要

和諧是一個緊湊的象徵層,數學音高關係、聲學協和性與音樂慣例在此交會。本報告將和弦符號序列視為一種可解釋、可控的時間序列,用於特定類型的和聲建模,而非音樂的完整表徵。從一個已凍結的流行爵士音樂Transformer檢查點開始,我評估小型適配介面能將模型延伸至十一種目標類型(藍調、巴薩諾瓦、巴哈合唱曲、鄉村、電子、民謠、放克、福音、嘻哈、節奏藍調/靈魂樂與搖滾)的程度。主要比較了LoRA、IA3、BitFit、前綴調適與完整微調等五種方法,涵蓋11種類型與3個隨機種子,構成完整的165格網格。所有五種方法在保留的和弦預測上均優於凍結基礎模型,宏觀增益從+2.89到+3.61個百分點不等;LoRA與IA3得分最高,但經過Holm與Benjamini-Hochberg校正的Wilcoxon檢定並未支持明確的勝出者。一項匹配數據量的對照實驗使結果更清晰:當各類型被降採樣至共同語料庫大小時,IA3仍居首位,但LoRA在全數據上的優勢消失並降至末位,顯示微小差異部分源自數據驅動。控制標記基線同樣表現強勁,而錯誤類型的適配器往往優於凍結基礎模型,表明大部分效果來自對可重複使用和聲基礎的輕量條件化,而非特定適配器家族。額外的診斷(秩掃描、錯誤類型輪替、基礎檢查點消融、純和弦類型分類、生成輸出統計、真實歌曲評估與重複分析)支持一個有界限的結論:和弦符號適配能可靠改善特定類型的和聲預測,但僅靠和弦符號本身無法承載完整的類型身份。因此,本報告避免論及感知的類型真實性或完整的音樂品質,這些需要受控制的聽眾或音樂家評估。
English
Harmony is a compact symbolic layer where mathematical pitch relations, acoustic consonance, and musical convention meet. This report treats chord-symbol sequences not as a complete representation of music, but as an interpretable, controllable time series for genre-local harmonic modeling. Starting from a frozen pop-jazz Music Transformer checkpoint, I evaluate how far small adaptation interfaces can extend the model to eleven target genres: blues, bossa nova, Bach chorales, country, electronic, folk, funk, gospel, hip-hop, R&B/soul, and rock. The main evaluation compares LoRA, IA3, BitFit, prefix tuning, and full fine-tuning over 11 genres and 3 seeds, a complete 165-cell grid. All five methods improve over the frozen base on held-out chord prediction, with macro gains from +2.89 to +3.61 points; LoRA and IA3 score highest, but Wilcoxon tests with Holm and Benjamini-Hochberg correction do not support a decisive winner. A matched-data-size control sharpens this: when genres are sub-sampled to a common corpus size, IA3 stays on top but LoRA's full-data edge disappears and it falls to last, indicating the small gaps are partly data-driven. A control-token baseline is also strong, and wrong-genre adapters often beat the frozen base, suggesting much of the effect comes from lightweight conditioning over a reusable harmonic base rather than one particular adapter family. Additional diagnostics (rank sweeps, wrong-genre rotation, a base-checkpoint ablation, chord-only genre classification, generated-output statistics, real-song evaluation, and duplicate analysis) support a bounded conclusion: chord-symbol adaptation reliably improves genre-local harmonic prediction, but chord symbols alone do not carry complete genre identity. The report therefore avoids claims about perceived genre authenticity or full musical quality, which require controlled listener or musician evaluation.