誰があなたの裁判官なのか？ LLM生成判決の検出可能性について

要旨

大規模言語モデル（LLM）ベースの判定は、強力なLLMを活用して候補コンテンツを効率的に評価し、判定スコアを提供します。しかし、LLMが生成する判定に内在するバイアスや脆弱性が懸念を引き起こしており、学術的な査読のような敏感なシナリオにおいてそれらを識別する必要性が緊急に求められています。本研究では、判定検出タスクを提案し、形式化し、LLM生成判定の検出可能性を体系的に調査します。LLM生成テキスト検出とは異なり、判定検出は判定スコアと候補のみに依存し、検出プロセスにおいてテキストフィードバックがしばしば利用できない現実世界のシナリオを反映しています。我々の予備分析によると、既存のLLM生成テキスト検出手法は、判定スコアと候補コンテンツ間の相互作用を捉える能力が欠如しているため、効果的な判定検出には不十分です。これに着想を得て、我々はJ-Detectorを導入します。これは、軽量で透明性の高いニューラル検出器であり、明示的に抽出された言語的特徴とLLM拡張特徴を活用して、LLM判定者のバイアスと候補の特性を結びつけ、正確な検出を実現します。多様なデータセットにわたる実験により、J-Detectorの有効性が実証され、その解釈可能性がLLM判定者のバイアスを定量化することを可能にすることが示されました。最後に、LLM生成判定の検出可能性に影響を与える主要な要因を分析し、現実世界のシナリオにおける判定検出の実用性を検証します。

English

Large Language Model (LLM)-based judgments leverage powerful LLMs to efficiently evaluate candidate content and provide judgment scores. However, the inherent biases and vulnerabilities of LLM-generated judgments raise concerns, underscoring the urgent need for distinguishing them in sensitive scenarios like academic peer reviewing. In this work, we propose and formalize the task of judgment detection and systematically investigate the detectability of LLM-generated judgments. Unlike LLM-generated text detection, judgment detection relies solely on judgment scores and candidates, reflecting real-world scenarios where textual feedback is often unavailable in the detection process. Our preliminary analysis shows that existing LLM-generated text detection methods perform poorly given their incapability to capture the interaction between judgment scores and candidate content -- an aspect crucial for effective judgment detection. Inspired by this, we introduce J-Detector, a lightweight and transparent neural detector augmented with explicitly extracted linguistic and LLM-enhanced features to link LLM judges' biases with candidates' properties for accurate detection. Experiments across diverse datasets demonstrate the effectiveness of J-Detector and show how its interpretability enables quantifying biases in LLM judges. Finally, we analyze key factors affecting the detectability of LLM-generated judgments and validate the practical utility of judgment detection in real-world scenarios.

誰があなたの裁判官なのか？ LLM生成判決の検出可能性について

Who's Your Judge? On the Detectability of LLM-Generated Judgments

要旨

Support