언제 컨텍스트를 신뢰할 것인가: 컨텍스트 신뢰성을 위한 자기 성찰적 논의

초록

대형 언어 모델은 종종 파라미터 기반 지식과 문맥적 입력 간의 충돌을 겪으며, 이로 인해 사실적 불일치나 환각(hallucination)이 발생하기 쉽다. 본 연구에서는 이러한 충돌을 해결하기 위해 토큰 수준의 자기 신뢰도와 비대칭적 다중 에이전트 토론을 통합한 경량 프레임워크인 문맥 신뢰성을 위한 자기 반성적 토론(Self-Reflective Debate for Contextual Reliability, SR-DCR)을 제안한다. 이 프레임워크에서는 문맥 정보가 없는 비평가(critic)가 주어진 문단을 근거로 주장하는 수호자(defender)에게 도전하며, 판단자(judge) 모델이 토론을 평가하여 문맥의 신뢰성을 결정한다. 최종 답변은 판단 결과와 모델의 신뢰도를 결합하여 선택된다. ClashEval 벤치마크에서의 실험 결과, SR-DCR은 오해의 소지가 있는 문맥에 대한 견고성을 지속적으로 향상시키면서도 신뢰할 수 있는 입력에 대한 정확도를 유지하며, 최소한의 계산 오버헤드로 기존의 토론 방식이나 신뢰도만을 기반으로 한 방법을 능가하는 것으로 나타났다. 코드는 https://github.com/smiles724/Self-Reflective-Debates에서 확인할 수 있다.

English

Large language models frequently encounter conflicts between their parametric knowledge and contextual input, often resulting in factual inconsistencies or hallucinations. We propose Self-Reflective Debate for Contextual Reliability (SR-DCR), a lightweight framework that integrates token-level self-confidence with an asymmetric multi-agent debate to adjudicate such conflicts. A critic, deprived of context, challenges a defender who argues from the given passage; a judge model evaluates the debate and determines the context's reliability. The final answer is selected by combining the verdict with model confidence. Experiments on the ClashEval benchmark demonstrate that SR-DCR consistently enhances robustness to misleading context while maintaining accuracy on trustworthy inputs, outperforming both classical debate and confidence-only baselines with minimal computational overhead. The code is available at https://github.com/smiles724/Self-Reflective-Debates.

언제 컨텍스트를 신뢰할 것인가: 컨텍스트 신뢰성을 위한 자기 성찰적 논의

When to Trust Context: Self-Reflective Debates for Context Reliability

초록

Support