신뢰하되 검증하라: 언어 모델의 주장 추론을 위한 이중 귀속 및 검증 프레임워크 DAVinCI 소개

초록

대규모 언어 모델(LLM)은 다양한 자연어 처리 과제에서 놀라운 유창성과 다양성을 보여주지만, 사실적 오류와 허구적 내용 생성(hallucination)에 취약한 한계를 여전히 지니고 있습니다. 이러한 한계는 신뢰성과 검증 가능성이 최우선인 의료, 법률, 과학 커뮤니케이션과 같은 고위험 분야에서 상당한 위험을 초래합니다. 본 논문에서는 LLM 출력의 사실적 신뢰성과 해석 가능성을 향상시키도록 설계된 이중 귀인 및 검증 프레임워크인 DAVinCI를 소개합니다. DAVinCI는 두 단계로 운영됩니다: (i) 생성된 주장을 내부 모델 구성 요소와 외부 소스에 귀인시키고, (ii) 함의 기반 추론과 신뢰도 보정을 사용하여 각 주장을 검증합니다. FEVER 및 CLIMATE-FEVER를 포함한 여러 데이터셋에 걸쳐 DAVinCI를 평가하고, 표준 검증 전용 베이스라인과 성능을 비교합니다. 결과에 따르면 DAVinCI는 분류 정확도, 귀인 정밀도, 재현율 및 F1 점수를 5~20% 크게 향상시킵니다. 포괄적인 제거 연구(ablation study)를 통해 증거 범위 선택, 재보정 임계값, 검색 품질의 기여도를 분리하여 분석합니다. 또한 기존 LLM 파이프라인에 통합될 수 있는 모듈식 DAVinCI 구현체를 공개합니다. 귀인과 검증을 연계함으로써 DAVinCI는 감사 가능하고 신뢰할 수 있는 AI 시스템으로 가는 확장 가능한 경로를 제공합니다. 본 연구는 LLM을 강력할 뿐만 아니라 책임감 있게 만드는 지속적인 노력에 기여합니다.

English

Large Language Models (LLMs) have demonstrated remarkable fluency and versatility across a wide range of NLP tasks, yet they remain prone to factual inaccuracies and hallucinations. This limitation poses significant risks in high-stakes domains such as healthcare, law, and scientific communication, where trust and verifiability are paramount. In this paper, we introduce DAVinCI - a Dual Attribution and Verification framework designed to enhance the factual reliability and interpretability of LLM outputs. DAVinCI operates in two stages: (i) it attributes generated claims to internal model components and external sources; (ii) it verifies each claim using entailment-based reasoning and confidence calibration. We evaluate DAVinCI across multiple datasets, including FEVER and CLIMATE-FEVER, and compare its performance against standard verification-only baselines. Our results show that DAVinCI significantly improves classification accuracy, attribution precision, recall, and F1-score by 5-20%. Through an extensive ablation study, we isolate the contributions of evidence span selection, recalibration thresholds, and retrieval quality. We also release a modular DAVinCI implementation that can be integrated into existing LLM pipelines. By bridging attribution and verification, DAVinCI offers a scalable path to auditable, trustworthy AI systems. This work contributes to the growing effort to make LLMs not only powerful but also accountable.

신뢰하되 검증하라: 언어 모델의 주장 추론을 위한 이중 귀속 및 검증 프레임워크 DAVinCI 소개

Trust but Verify: Introducing DAVinCI -- A Framework for Dual Attribution and Verification in Claim Inference for Language Models

초록

Support