단계별 자기 보상을 통한 다중 모달 환각 완화

초록

대규모 시각-언어 모델(LVLMs)은 여전히 생성된 응답이 시각 입력과 불일치하는 시각 환각 현상에 어려움을 겪고 있습니다. 기존 방법들은 대규모 주석 데이터에 의존하여 미세 조정을 수행하거나(이는 막대한 계산 오버헤드를 초래함), 환각 발생의 동적 특성을 간과하는 정적 사후 전략을 사용합니다. 이러한 문제를 해결하기 위해 우리는 외부 감독 없이 추론 시점에 동적으로 환각을 완화할 수 있는 새로운 자기 보상 프레임워크를 제안합니다. 실증적 측면에서 우리는 시각 환각이 의미 단계별 동적 패턴을 보이며, 각 의미 단계의 시작 시점에 정점에 도달한다는 것을 발견했습니다. 이러한 통찰을 바탕으로, 단계별 자기 보상 신호에 기반한 온라인 환각 보정 기법인 PSRD(Phase-wise **Self-Reward Decoding**)를 제안합니다. 디코딩 과정에서 반복적인 자기 평가의 비용을 줄이기 위해, 우리는 LVLM에서 환각 유도 신호를 경량 보상 모델로 증류합니다. 이 보상 모델은 이후 디코딩 과정에서 표적 중재를 위한 실시간 지도를 제공하여 정밀한 환각 억제를 가능하게 합니다. 제안된 PSRD는 LLaVA-1.5-7B 모델의 환각 비율을 50.0% 크게 감소시키며, 4가지 LVLM에 대한 5개의 환각 평가 벤치마크에서 기존 사후 방법들을 일관되게 능가합니다. 추가 분석을 통해 PSRD가 환각 전파를 효과적으로 완화하고, 강력한 성능과 추론 효율성 사이의 높은 수준의 제어 가능한 균형을 달성함을 확인했습니다.

English

Large Vision-Language Models (LVLMs) still struggle with vision hallucination, where generated responses are inconsistent with the visual input. Existing methods either rely on large-scale annotated data for fine-tuning, which incurs massive computational overhead, or employ static post-hoc strategies that overlook the dynamic nature of hallucination emergence. To address these, we introduce a new self-rewarding framework, enabling dynamic hallucination mitigation at inference time without external supervision. On the empirical side, we reveal that visual hallucination exhibits phase-wise dynamic patterns, peaking at the onset of each semantic phase. Drawing on these insights, we propose PSRD (Phase-wise \textbf{Self-Reward Decoding) for online hallucination correction guided by phase-wise self-reward signals. To reduce the cost of repeated self-evaluation during decoding, we distill the hallucination guidance signal from LVLMs into a lightweight reward model. The reward model subsequently provides on-the-fly guidance for targeted intervention during the decoding process, enabling precise hallucination suppression. The proposed PSRD significantly reduces the hallucination rate of LLaVA-1.5-7B by 50.0% and consistently outperforms existing post-hoc methods across five hallucination evaluation benchmarks for four LVLMs. Further analysis confirms that PSRD effectively mitigates hallucination propagation and achieves a highly controllable trade-off between strong performance and inference efficiency.

단계별 자기 보상을 통한 다중 모달 환각 완화

Mitigating Multimodal Hallucination via Phase-wise Self-reward

초록

Support