誰是你的裁判？論LLM生成判決的可檢測性

摘要

基於大型語言模型（LLM）的評判利用強大的LLM來高效評估候選內容並提供評分。然而，LLM生成評判所固有的偏見和脆弱性引發了擔憂，這凸顯了在學術同行評審等敏感場景中區分這些評判的迫切需求。在本研究中，我們提出並形式化了評判檢測任務，並系統地探討了LLM生成評判的可檢測性。與LLM生成文本檢測不同，評判檢測僅依賴於評分和候選內容，這反映了在檢測過程中通常無法獲得文本反饋的現實場景。我們的初步分析表明，現有的LLM生成文本檢測方法由於無法捕捉評分與候選內容之間的互動——這是有效評判檢測的關鍵方面——表現不佳。受此啟發，我們引入了J-Detector，這是一種輕量級且透明的神經檢測器，通過顯式提取的語言特徵和LLM增強特徵來連結LLM評判者的偏見與候選內容的屬性，以實現準確檢測。跨多樣數據集的實驗證明了J-Detector的有效性，並展示了其可解釋性如何量化LLM評判者的偏見。最後，我們分析了影響LLM生成評判可檢測性的關鍵因素，並驗證了評判檢測在現實場景中的實際效用。

English

Large Language Model (LLM)-based judgments leverage powerful LLMs to efficiently evaluate candidate content and provide judgment scores. However, the inherent biases and vulnerabilities of LLM-generated judgments raise concerns, underscoring the urgent need for distinguishing them in sensitive scenarios like academic peer reviewing. In this work, we propose and formalize the task of judgment detection and systematically investigate the detectability of LLM-generated judgments. Unlike LLM-generated text detection, judgment detection relies solely on judgment scores and candidates, reflecting real-world scenarios where textual feedback is often unavailable in the detection process. Our preliminary analysis shows that existing LLM-generated text detection methods perform poorly given their incapability to capture the interaction between judgment scores and candidate content -- an aspect crucial for effective judgment detection. Inspired by this, we introduce J-Detector, a lightweight and transparent neural detector augmented with explicitly extracted linguistic and LLM-enhanced features to link LLM judges' biases with candidates' properties for accurate detection. Experiments across diverse datasets demonstrate the effectiveness of J-Detector and show how its interpretability enables quantifying biases in LLM judges. Finally, we analyze key factors affecting the detectability of LLM-generated judgments and validate the practical utility of judgment detection in real-world scenarios.

誰是你的裁判？論LLM生成判決的可檢測性

Who's Your Judge? On the Detectability of LLM-Generated Judgments

摘要

Support