视觉多智能体系统:通过视觉流缓解幻觉雪球效应
Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow
September 26, 2025
作者: Xinlei Yu, Chengming Xu, Guibin Zhang, Yongbo He, Zhangquan Chen, Zhucun Xue, Jiangning Zhang, Yue Liao, Xiaobin Hu, Yu-Gang Jiang, Shuicheng Yan
cs.AI
摘要
由视觉语言模型(VLMs)驱动的多智能体系统(MAS)能够执行复杂任务,但面临一种新型故障现象——多智能体视觉幻觉雪球效应,即单个智能体产生的幻觉因过度依赖文本流传递视觉信息而被后续智能体放大。通过回合、层级和词元层面的注意力分析,我们深入揭示了幻觉雪球效应的本质,即视觉注意力分配的减少。这使我们识别出一组在中间层具有单峰注意力峰值的视觉词元,这些词元最能保留视觉证据,但在更深层的智能体回合中逐渐减弱,导致MAS中的视觉幻觉雪球效应。因此,我们提出了ViF,一种轻量级、即插即用的缓解范式,它利用选定的视觉中继词元通过视觉流传递智能体间消息,并应用注意力重分配来强化这一模式。实验结果表明,我们的方法显著减少了幻觉雪球效应,在基于四种常见MAS结构和十种基础模型的八个基准测试中持续提升了性能。源代码将发布于:https://github.com/YU-deep/ViF.git。
English
Multi-Agent System (MAS) powered by Visual Language Models (VLMs) enables
challenging tasks but suffers from a novel failure term, multi-agent visual
hallucination snowballing, where hallucinations are seeded in a single agent
and amplified by following ones due to the over-reliance on textual flow to
relay visual information. Through turn-, layer-, and token-wise attention
analyses, we provide detailed insights into the essence of hallucination
snowballing regarding the reduction of visual attention allocation. It leads us
to identify a subset of vision tokens with a unimodal attention peak in middle
layers that best preserve visual evidence but gradually diminish in deeper
agent turns, resulting in the visual hallucination snowballing in MAS. Thus, we
propose ViF, a lightweight, plug-and-play mitigation paradigm that relays
inter-agent messages with Visual Flow powered by the selected visual relay
tokens and applies attention reallocation to amplify this pattern. The
experiment results demonstrate that our method markedly reduces hallucination
snowballing, consistently improving the performance across eight benchmarks
based on four common MAS structures and ten base models. The source code will
be available at: https://github.com/YU-deep/ViF.git.