코드 심볼 시계열 적응이 장르 정체성을 어디까지 전달할 수 있는가? 다중 장르 코드 심볼 모델링의 능력과 한계

초록

화성(Harmony)은 수학적 음정 관계, 음향적 협화음, 그리고 음악적 관습이 만나는 간결한 기호적 층위이다. 이 보고서는 코드 기호 시퀀스를 음악의 완전한 표현이 아니라, 장르 국소적 화성 모델링을 위한 해석 가능하고 제어 가능한 시계열로 간주한다. 팝-재즈 음악 트랜스포머의 고정된 체크포인트에서 출발하여, 작은 적응 인터페이스가 모델을 블루스, 보사노바, 바흐 코랄, 컨트리, 일렉트로닉, 포크, 펑크, 가스펠, 힙합, R&B/소울, 록의 11개 대상 장르로 얼마나 확장할 수 있는지 평가한다. 주요 평가는 LoRA, IA3, BitFit, 프리픽스 튜닝, 전체 파인튜닝을 11개 장르와 3개 시드(seed)에 걸쳐 비교한 완전한 165개 셀 그리드이다. 다섯 가지 방법 모두 보류된 코드 예측에서 고정된 베이스 모델보다 개선되었으며, 거시적 이득은 +2.89에서 +3.61포인트 범위이다. LoRA와 IA3가 가장 높은 점수를 기록했으나, Holm 및 Benjamini-Hochberg 보정을 적용한 Wilcoxon 검정은 결정적 승자를 지지하지 않는다. 데이터 크기를 일치시킨 대조 실험은 이를 더욱 명확히 한다: 장르를 공통 코퍼스 크기로 하위 샘플링했을 때 IA3가 최상위를 유지하지만, LoRA의 전체 데이터 이점은 사라지고 최하위로 떨어지며, 이는 작은 차이가 부분적으로 데이터에 기인함을 시사한다. 대조 토큰 기준선도 강력하며, 잘못된 장르 어댑터가 종종 고정된 베이스 모델을 능가하는데, 이는 효과의 대부분이 특정 어댑터 계열보다는 재사용 가능한 화성 기반에 대한 경량 조건화(lightweight conditioning)에서 비롯됨을 시사한다. 추가 진단(랭크 스윕, 잘못된 장르 순환, 베이스 체크포인트 제거 실험, 코드 전용 장르 분류, 생성 출력 통계, 실제 곡 평가, 중복 분석)은 제한된 결론을 뒷받침한다: 코드 기호 적응은 장르 국소적 화성 예측을 신뢰성 있게 개선하지만, 코드 기호만으로는 완전한 장르 정체성을 전달하지 못한다. 따라서 이 보고서는 인지된 장르 진정성이나 완전한 음악적 품질에 대한 주장을 피하며, 이는 통제된 청취자 또는 연주자 평가를 필요로 한다.

English

Harmony is a compact symbolic layer where mathematical pitch relations, acoustic consonance, and musical convention meet. This report treats chord-symbol sequences not as a complete representation of music, but as an interpretable, controllable time series for genre-local harmonic modeling. Starting from a frozen pop-jazz Music Transformer checkpoint, I evaluate how far small adaptation interfaces can extend the model to eleven target genres: blues, bossa nova, Bach chorales, country, electronic, folk, funk, gospel, hip-hop, R&B/soul, and rock. The main evaluation compares LoRA, IA3, BitFit, prefix tuning, and full fine-tuning over 11 genres and 3 seeds, a complete 165-cell grid. All five methods improve over the frozen base on held-out chord prediction, with macro gains from +2.89 to +3.61 points; LoRA and IA3 score highest, but Wilcoxon tests with Holm and Benjamini-Hochberg correction do not support a decisive winner. A matched-data-size control sharpens this: when genres are sub-sampled to a common corpus size, IA3 stays on top but LoRA's full-data edge disappears and it falls to last, indicating the small gaps are partly data-driven. A control-token baseline is also strong, and wrong-genre adapters often beat the frozen base, suggesting much of the effect comes from lightweight conditioning over a reusable harmonic base rather than one particular adapter family. Additional diagnostics (rank sweeps, wrong-genre rotation, a base-checkpoint ablation, chord-only genre classification, generated-output statistics, real-song evaluation, and duplicate analysis) support a bounded conclusion: chord-symbol adaptation reliably improves genre-local harmonic prediction, but chord symbols alone do not carry complete genre identity. The report therefore avoids claims about perceived genre authenticity or full musical quality, which require controlled listener or musician evaluation.