MMPersuade:多模态劝服数据集与评估框架
MMPersuade: A Dataset and Evaluation Framework for Multimodal Persuasion
October 26, 2025
作者: Haoyi Qiu, Yilun Zhou, Pranav Narayanan Venkit, Kung-Hsiang Huang, Jiaxin Zhang, Nanyun Peng, Chien-Sheng Wu
cs.AI
摘要
随着大型视觉语言模型(LVLM)在购物、健康、新闻等领域的广泛应用,它们正面临无处不在的 persuasive 内容。一个关键问题在于这些模型作为被说服对象如何运作——即它们为何及如何受到多模态 persuasive 信息的影响。理解模型对 persuasion 的易感性与不同 persuasive 策略的有效性至关重要,因为过度易受影响的模型可能采纳误导性信念、覆盖用户偏好,或在接触操纵性信息时生成不道德或不安全的输出。我们提出MMPersuade这一统一框架,用于系统研究LVLM中的多模态 persuasion 动态。该框架包含两大贡献:(i)一个综合多模态数据集,将图像和视频与商业、主观行为及对抗场景中成熟的 persuasion 原则相配对;(ii)通过第三方一致性评分和基于对话历史的自估计 token 概率,量化 persuasion 有效性与模型易感性的评估框架。我们对六种主流LVLM作为被说服对象的研究揭示三大发现:(i)与纯文本相比,多模态输入显著提升 persuasion 有效性(及模型易感性),在错误信息场景中尤为明显;(ii)预先声明的偏好会降低易感性,但多模态信息仍保持其 persuasive 优势;(iii)不同策略在不同场景中效果各异,互惠原则在商业和主观场景中最有效,而可信度与逻辑性在对抗场景中占主导。通过联合分析 persuasion 有效性与模型易感性,MMPersuade为开发能够稳健处理 persuasive 多模态内容、保持偏好一致且符合伦理规范的模型奠定了理论基础。
English
As Large Vision-Language Models (LVLMs) are increasingly deployed in domains
such as shopping, health, and news, they are exposed to pervasive persuasive
content. A critical question is how these models function as persuadees-how and
why they can be influenced by persuasive multimodal inputs. Understanding both
their susceptibility to persuasion and the effectiveness of different
persuasive strategies is crucial, as overly persuadable models may adopt
misleading beliefs, override user preferences, or generate unethical or unsafe
outputs when exposed to manipulative messages. We introduce MMPersuade, a
unified framework for systematically studying multimodal persuasion dynamics in
LVLMs. MMPersuade contributes (i) a comprehensive multimodal dataset that pairs
images and videos with established persuasion principles across commercial,
subjective and behavioral, and adversarial contexts, and (ii) an evaluation
framework that quantifies both persuasion effectiveness and model
susceptibility via third-party agreement scoring and self-estimated token
probabilities on conversation histories. Our study of six leading LVLMs as
persuadees yields three key insights: (i) multimodal inputs substantially
increase persuasion effectiveness-and model susceptibility-compared to text
alone, especially in misinformation scenarios; (ii) stated prior preferences
decrease susceptibility, yet multimodal information maintains its persuasive
advantage; and (iii) different strategies vary in effectiveness across
contexts, with reciprocity being most potent in commercial and subjective
contexts, and credibility and logic prevailing in adversarial contexts. By
jointly analyzing persuasion effectiveness and susceptibility, MMPersuade
provides a principled foundation for developing models that are robust,
preference-consistent, and ethically aligned when engaging with persuasive
multimodal content.