누가 당신의 판단자인가? LLM 생성 판단의 탐지 가능성에 관하여

초록

대형 언어 모델(LLM) 기반 평가는 강력한 LLM을 활용하여 후보 콘텐츠를 효율적으로 평가하고 점수를 제공합니다. 그러나 LLM이 생성한 평가의 내재적 편향과 취약성은 우려를 불러일으키며, 특히 학술 동료 검토와 같은 민감한 시나리오에서 이를 구별할 필요성을 강조합니다. 본 연구에서는 평가 탐지 작업을 제안하고 공식화하며, LLM이 생성한 평가의 탐지 가능성을 체계적으로 조사합니다. LLM이 생성한 텍스트 탐지와 달리, 평가 탐지는 평가 점수와 후보 콘텐츠에만 의존하며, 이는 탐지 과정에서 텍스트 피드백이 종종 제공되지 않는 실제 시나리오를 반영합니다. 예비 분석 결과, 기존의 LLM 생성 텍스트 탐지 방법은 평가 점수와 후보 콘텐츠 간의 상호작용을 포착하지 못해 효과적인 평가 탐지에 있어 부진한 성능을 보였습니다. 이를 바탕으로, 우리는 J-Detector를 소개합니다. 이는 경량화되고 투명한 신경망 탐지기로, 명시적으로 추출된 언어적 특성과 LLM 강화 특성을 통해 LLM 평가자의 편향과 후보의 속성을 연결하여 정확한 탐지를 가능하게 합니다. 다양한 데이터셋에서의 실험을 통해 J-Detector의 효과성을 입증하고, 그 해석 가능성이 LLM 평가자의 편향을 정량화하는 데 어떻게 기여하는지 보여줍니다. 마지막으로, LLM 생성 평가의 탐지 가능성에 영향을 미치는 주요 요인을 분석하고, 실제 시나리오에서 평가 탐지의 실용적 유용성을 검증합니다.

English

Large Language Model (LLM)-based judgments leverage powerful LLMs to efficiently evaluate candidate content and provide judgment scores. However, the inherent biases and vulnerabilities of LLM-generated judgments raise concerns, underscoring the urgent need for distinguishing them in sensitive scenarios like academic peer reviewing. In this work, we propose and formalize the task of judgment detection and systematically investigate the detectability of LLM-generated judgments. Unlike LLM-generated text detection, judgment detection relies solely on judgment scores and candidates, reflecting real-world scenarios where textual feedback is often unavailable in the detection process. Our preliminary analysis shows that existing LLM-generated text detection methods perform poorly given their incapability to capture the interaction between judgment scores and candidate content -- an aspect crucial for effective judgment detection. Inspired by this, we introduce J-Detector, a lightweight and transparent neural detector augmented with explicitly extracted linguistic and LLM-enhanced features to link LLM judges' biases with candidates' properties for accurate detection. Experiments across diverse datasets demonstrate the effectiveness of J-Detector and show how its interpretability enables quantifying biases in LLM judges. Finally, we analyze key factors affecting the detectability of LLM-generated judgments and validate the practical utility of judgment detection in real-world scenarios.

누가 당신의 판단자인가? LLM 생성 판단의 탐지 가능성에 관하여

Who's Your Judge? On the Detectability of LLM-Generated Judgments

초록

Support