MoHoBench：回答不能な視覚的質問によるマルチモーダル大規模言語モデルの誠実性評価

要旨

近年、マルチモーダル大規模言語モデル（MLLMs）は視覚-言語タスクにおいて大きな進展を遂げているが、潜在的に有害または信頼できないコンテンツを生成する可能性がある。言語モデルの信頼性を調査する研究は数多く存在するものの、特に視覚的に回答不能な質問に直面した際のMLLMsの正直さに関する能力は、ほとんど未解明のままである。本研究では、様々なMLLMsの正直さの行動を体系的に評価する初めての試みを提示する。我々は、回答不能な視覚的質問に対するモデルの応答行動に基づいて正直さを定義し、そのような質問の4つの代表的なタイプを定義し、12,000以上の視覚的質問サンプルからなる大規模なMLLMs正直さベンチマーク「MoHoBench」を構築した。その品質は、多段階のフィルタリングと人間による検証によって保証されている。MoHoBenchを使用して、28の主要なMLLMsの正直さをベンチマークし、包括的な分析を行った。我々の調査結果は以下の通りである：（1）ほとんどのモデルは、必要な場合に適切に回答を拒否することができない、（2）MLLMsの正直さは単なる言語モデリングの問題ではなく、視覚情報に深く影響を受けるため、マルチモーダル正直さアラインメントのための専用の手法の開発が必要である。したがって、我々は教師あり学習と選好学習を用いた初期のアラインメント手法を実装し、正直さの行動を改善し、信頼できるMLLMsのための将来の研究の基盤を提供した。我々のデータとコードはhttps://github.com/DSTTSD/MoHoBenchで公開されている。

English

Recently Multimodal Large Language Models (MLLMs) have achieved considerable advancements in vision-language tasks, yet produce potentially harmful or untrustworthy content. Despite substantial work investigating the trustworthiness of language models, MMLMs' capability to act honestly, especially when faced with visually unanswerable questions, remains largely underexplored. This work presents the first systematic assessment of honesty behaviors across various MLLMs. We ground honesty in models' response behaviors to unanswerable visual questions, define four representative types of such questions, and construct MoHoBench, a large-scale MMLM honest benchmark, consisting of 12k+ visual question samples, whose quality is guaranteed by multi-stage filtering and human verification. Using MoHoBench, we benchmarked the honesty of 28 popular MMLMs and conducted a comprehensive analysis. Our findings show that: (1) most models fail to appropriately refuse to answer when necessary, and (2) MMLMs' honesty is not solely a language modeling issue, but is deeply influenced by visual information, necessitating the development of dedicated methods for multimodal honesty alignment. Therefore, we implemented initial alignment methods using supervised and preference learning to improve honesty behavior, providing a foundation for future work on trustworthy MLLMs. Our data and code can be found at https://github.com/DSTTSD/MoHoBench.

MoHoBench：回答不能な視覚的質問によるマルチモーダル大規模言語モデルの誠実性評価

MoHoBench: Assessing Honesty of Multimodal Large Language Models via Unanswerable Visual Questions

要旨

Support