隱於顯處:探析多模態語言模型中的隱性推理
Hidden in Plain Sight: Probing Implicit Reasoning in Multimodal Language Models
May 30, 2025
作者: Qianqi Yan, Hongquan Li, Shan Jiang, Yang Zhao, Xinze Guan, Ching-Chen Kuo, Xin Eric Wang
cs.AI
摘要
多模态大型語言模型(MLLMs)正日益被部署於開放且真實的環境中,這些環境中的輸入數據往往雜亂無章、定義不明,且並非總是可信。與精心設計的基準測試不同,這些情境經常涉及指令,這些指令可能指向缺失的對象或矛盾的事實,依賴於模糊的參考,或要求不可行的行動。在這種情況下,成功的關鍵不僅在於任務的執行,還在於模型能否察覺到某些情況在無聲中出錯。本文系統地分析了當前MLLMs如何處理這類隱含推理場景:即缺陷未被明確指出,但必須從上下文中推斷出來的情況。通過使用一套涵蓋四類現實世界故障模式的診斷工具,我們評估了包括o3和GPT-4o在內的六種MLLMs,發現這些模型經常未能揭示隱藏的問題,即使它們具備必要的感知和推理能力。明確的提示顯示,這些底層能力確實存在,但往往被壓制以順應用戶需求。我們進一步展示,簡單的推理時干預措施,如謹慎的角色提示,尤其是要求澄清問題,可以顯著恢復性能。我們的研究結果凸顯了當前MLLMs在推理能力與行為順從性之間存在的持續差距,並提出了在約束不足的環境中使這些模型更加可信的實用策略。
English
Multimodal large language models (MLLMs) are increasingly deployed in
open-ended, real-world environments where inputs are messy, underspecified, and
not always trustworthy. Unlike curated benchmarks, these settings frequently
involve instructions that refer to missing objects or contradictory facts, rely
on ambiguous references, or request infeasible actions. In such cases, success
hinges not on task execution alone, but on a model's ability to detect when
something is silently wrong. This paper presents a systematic analysis of how
current MLLMs handle such implicit reasoning scenarios: cases where the flaw is
not explicitly stated but must be inferred from context. Using a curated
diagnostic suite spanning four categories of real-world failure modes, we
evaluate six MLLMs, including o3 and GPT-4o, and find that models frequently
fail to surface hidden issues, even when they possess the necessary perceptual
and reasoning skills. Explicit prompting reveals that the underlying
capabilities exist but are often suppressed in favor of user compliance. We
further show that simple inference-time interventions, such as cautious persona
prompting and, in particular, requiring a clarifying question, can dramatically
recover performance. Our findings highlight a persistent gap between reasoning
competence and behavioral compliance in current MLLMs and suggest practical
strategies for making these models more trustworthy in underconstrained
environments.