MoHoBench：通過不可回答的視覺問題評估多模態大型語言模型的誠實性

摘要

近期，多模态大语言模型（MLLMs）在视觉-语言任务中取得了显著进展，然而其生成的内容可能存在潜在危害或不可信。尽管已有大量研究探讨语言模型的可信度，但MLLMs在面对视觉上无法回答的问题时，其诚实行为的能力仍很大程度上未被充分探索。本研究首次系统评估了多种MLLMs的诚实行为。我们将诚实性锚定于模型对视觉上无法回答问题的响应行为，定义了四类代表性的此类问题，并构建了MoHoBench——一个大规模MLLM诚实性基准，包含超过12,000个视觉问题样本，其质量通过多阶段筛选和人工验证得到保证。利用MoHoBench，我们对28个流行的MLLMs进行了诚实性基准测试，并进行了全面分析。我们的发现表明：（1）大多数模型在必要时未能适当地拒绝回答；（2）MLLMs的诚实性不仅仅是语言建模问题，还深受视觉信息的影响，因此需要开发专门的方法来实现多模态诚实性对齐。为此，我们实施了基于监督学习和偏好学习的初步对齐方法，以改善诚实行为，为未来可信MLLMs的研究奠定了基础。我们的数据和代码可在https://github.com/DSTTSD/MoHoBench找到。

English

Recently Multimodal Large Language Models (MLLMs) have achieved considerable advancements in vision-language tasks, yet produce potentially harmful or untrustworthy content. Despite substantial work investigating the trustworthiness of language models, MMLMs' capability to act honestly, especially when faced with visually unanswerable questions, remains largely underexplored. This work presents the first systematic assessment of honesty behaviors across various MLLMs. We ground honesty in models' response behaviors to unanswerable visual questions, define four representative types of such questions, and construct MoHoBench, a large-scale MMLM honest benchmark, consisting of 12k+ visual question samples, whose quality is guaranteed by multi-stage filtering and human verification. Using MoHoBench, we benchmarked the honesty of 28 popular MMLMs and conducted a comprehensive analysis. Our findings show that: (1) most models fail to appropriately refuse to answer when necessary, and (2) MMLMs' honesty is not solely a language modeling issue, but is deeply influenced by visual information, necessitating the development of dedicated methods for multimodal honesty alignment. Therefore, we implemented initial alignment methods using supervised and preference learning to improve honesty behavior, providing a foundation for future work on trustworthy MLLMs. Our data and code can be found at https://github.com/DSTTSD/MoHoBench.

MoHoBench：通過不可回答的視覺問題評估多模態大型語言模型的誠實性

MoHoBench: Assessing Honesty of Multimodal Large Language Models via Unanswerable Visual Questions

摘要

Support