偏好洩漏：LLM作為法官中的污染問題

摘要

大型語言模型（LLMs）作為評判和基於LLM的數據合成已成為模型開發中兩種基本的LLM驅動數據標註方法。儘管它們的結合顯著提高了模型訓練和評估的效率，但對這種新模型開發範式可能帶來的潛在污染卻鮮有關注。在這項工作中，我們揭示了偏好洩漏，這是由於合成數據生成器和基於LLM的評估器之間的相關性而在LLM作為評判中引起的污染問題。為了研究這個問題，我們首先定義了數據生成器LLM和評判LLM之間的三種常見相關性：相同模型、具有繼承關係和屬於同一模型家族。通過廣泛的實驗，我們在多個LLM基線和基準測試中實證了評判對其相關學生模型的偏好洩漏所導致的偏見。進一步的分析表明，相對於先前識別出的LLM作為評判場景中的偏見，偏好洩漏是一個更難檢測的普遍問題。所有這些發現都暗示了偏好洩漏在LLM作為評判領域是一個普遍且具有挑戰性的問題。我們在以下網址釋出所有代碼和數據：https://github.com/David-Li0406/Preference-Leakage。

English

Large Language Models (LLMs) as judges and LLM-based data synthesis have emerged as two fundamental LLM-driven data annotation methods in model development. While their combination significantly enhances the efficiency of model training and evaluation, little attention has been given to the potential contamination brought by this new model development paradigm. In this work, we expose preference leakage, a contamination problem in LLM-as-a-judge caused by the relatedness between the synthetic data generators and LLM-based evaluators. To study this issue, we first define three common relatednesses between data generator LLM and judge LLM: being the same model, having an inheritance relationship, and belonging to the same model family. Through extensive experiments, we empirically confirm the bias of judges towards their related student models caused by preference leakage across multiple LLM baselines and benchmarks. Further analysis suggests that preference leakage is a pervasive issue that is harder to detect compared to previously identified biases in LLM-as-a-judge scenarios. All of these findings imply that preference leakage is a widespread and challenging problem in the area of LLM-as-a-judge. We release all codes and data at: https://github.com/David-Li0406/Preference-Leakage.

偏好洩漏：LLM作為法官中的污染問題

Preference Leakage: A Contamination Problem in LLM-as-a-judge

摘要

Support