偏好洩漏:LLM作為法官中的污染問題
Preference Leakage: A Contamination Problem in LLM-as-a-judge
February 3, 2025
作者: Dawei Li, Renliang Sun, Yue Huang, Ming Zhong, Bohan Jiang, Jiawei Han, Xiangliang Zhang, Wei Wang, Huan Liu
cs.AI
摘要
大型語言模型(LLMs)作為評判和基於LLM的數據合成已成為模型開發中兩種基本的LLM驅動數據標註方法。儘管它們的結合顯著提高了模型訓練和評估的效率,但對這種新模型開發範式可能帶來的潛在污染卻鮮有關注。在這項工作中,我們揭示了偏好洩漏,這是由於合成數據生成器和基於LLM的評估器之間的相關性而在LLM作為評判中引起的污染問題。為了研究這個問題,我們首先定義了數據生成器LLM和評判LLM之間的三種常見相關性:相同模型、具有繼承關係和屬於同一模型家族。通過廣泛的實驗,我們在多個LLM基線和基準測試中實證了評判對其相關學生模型的偏好洩漏所導致的偏見。進一步的分析表明,相對於先前識別出的LLM作為評判場景中的偏見,偏好洩漏是一個更難檢測的普遍問題。所有這些發現都暗示了偏好洩漏在LLM作為評判領域是一個普遍且具有挑戰性的問題。我們在以下網址釋出所有代碼和數據:https://github.com/David-Li0406/Preference-Leakage。
English
Large Language Models (LLMs) as judges and LLM-based data synthesis have
emerged as two fundamental LLM-driven data annotation methods in model
development. While their combination significantly enhances the efficiency of
model training and evaluation, little attention has been given to the potential
contamination brought by this new model development paradigm. In this work, we
expose preference leakage, a contamination problem in LLM-as-a-judge caused by
the relatedness between the synthetic data generators and LLM-based evaluators.
To study this issue, we first define three common relatednesses between data
generator LLM and judge LLM: being the same model, having an inheritance
relationship, and belonging to the same model family. Through extensive
experiments, we empirically confirm the bias of judges towards their related
student models caused by preference leakage across multiple LLM baselines and
benchmarks. Further analysis suggests that preference leakage is a pervasive
issue that is harder to detect compared to previously identified biases in
LLM-as-a-judge scenarios. All of these findings imply that preference leakage
is a widespread and challenging problem in the area of LLM-as-a-judge. We
release all codes and data at:
https://github.com/David-Li0406/Preference-Leakage.Summary
AI-Generated Summary