視覚言語モデルにおける幻覚現象に対するオンライン自己校正

要旨

大規模視覚言語モデル（LVLM）は、入力画像に存在しない視覚的詳細を含む記述を生成する幻覚（hallucination）に悩まされることが多い。最近の選好アライメント手法は、通常、GPTのようなより強力なモデルから蒸留された監督信号に依存している。しかし、このオフラインのパラダイムは「監督-知覚ミスマッチ」を引き起こす。すなわち、学生モデルは自身の知覚能力を超えた微細な詳細とのアライメントを強要され、「見る」ことではなく「推測する」ことを学習してしまう。オンライン学習のための信頼性の高い自己監督を獲得するため、我々はLVLM内に「生成的-識別的ギャップ」を特定した。これは、モデルがオープンエンド生成よりも識別的検証においてより高い精度を示す現象である。この能力を活用し、我々はモンテカルロ木探索とデュアルグラニュラリティ報酬メカニズムを統合して選好データを構築し、Direct Preference Optimizationを通じてモデルを反復的に改良するフレームワーク、Online Self-CAlibRation（OSCAR）を提案する。大規模な実験により、OSCARが幻覚ベンチマークでstate-of-the-artの性能を達成すると同時に、一般的なマルチモーダル能力も向上させることを実証した。

English

Large Vision-Language Models (LVLMs) often suffer from hallucinations, generating descriptions that include visual details absent from the input image. Recent preference alignment methods typically rely on supervision distilled from stronger models such as GPT. However, this offline paradigm introduces a Supervision-Perception Mismatch: the student model is forced to align with fine-grained details beyond its perceptual capacity, learning to guess rather than to see. To obtain reliable self-supervision for online learning, we identify a Generative-Discriminative Gap within LVLMs, where models exhibit higher accuracy on discriminative verification than open-ended generation. Leveraging this capability, we propose Online Self-CAlibRation (OSCAR), a framework that integrates Monte Carlo Tree Search with a Dual-Granularity Reward Mechanism to construct preference data and iteratively refines the model via Direct Preference Optimization. Extensive experiments demonstrate that OSCAR achieves state-of-the-art performance on hallucination benchmarks while improving general multimodal capabilities.

視覚言語モデルにおける幻覚現象に対するオンライン自己校正

Online Self-Calibration Against Hallucination in Vision-Language Models

要旨

Support