TuneJury:一種提升音樂生成偏好對齊的開放式指標
TuneJury: An Open Metric for Improving Music Generation Preference Alignment
June 15, 2026
作者: Yonghyun Kim, Junwon Lee, Haiwen Xia, Yinghao Ma, Junghyun Koo, Koichi Saito, Yuki Mitsufuji, Chris Donahue
cs.AI
摘要
我們介紹 TuneJury,這是一個開放的、基於實例層級的成對獎勵模型,專為文字轉音樂設計,能根據文字提示與音訊片段預測音樂偏好分數。所釋出的檢查點是以公開的人類偏好標籤進行訓練,涵蓋競技場風格(A vs. B)投票、度量對齊偏好對、群眾外包成對比較,以及專家美學評分。兩個片段之間的預測分數差在我們保留的測試集中校準良好,可透過簡單的分數閾值支援資料過濾。TuneJury 能泛化至保留的測試對以及分佈外基準,且在後者上與先前的基線模型保持競爭力。對於訓練後才釋出的生成器,我們引入了錨定校準(anchor calibration),這是一種事後的、每個系統獨立的 Bradley-Terry 校準方法,能以顯著優於從頭重新訓練的資料效率恢復一致性。相同的凍結獎勵在三種下游應用中驅動一致的獎勵軸增益:推理時的最佳 N 選取、DITTO 風格的潛在最佳化,以及專家迭代後訓練。TuneJury 可在 https://github.com/yonghyunk1m/TuneJury 取得。
English
We introduce TuneJury, an open, instance-level pairwise reward model for text-to-music that predicts a music preference score from a text prompt and an audio clip. The released checkpoint is trained on publicly available human-preference labels covering arena-style (A vs. B) votes, metric-alignment preference pairs, crowdsourced pairwise comparisons, and expert aesthetic ratings. The predicted score margin between two clips is well calibrated on our held-out test split, supporting data filtering via a simple score threshold. TuneJury generalizes to both held-out test pairs and out-of-distribution benchmarks, remaining competitive with prior baselines on the latter. For generators released after training, we introduce anchor calibration, a post-hoc, per-system Bradley-Terry calibration that recovers agreement at substantially better data efficiency than from-scratch retraining. The same frozen reward drives consistent reward-axis gains across three downstream applications: inference-time best-of-N selection, DITTO-style latent optimization, and expert-iteration post-training. TuneJury is available at https://github.com/yonghyunk1m/TuneJury.