多模态大型语言模型可信度基准测试：一项全面研究

摘要

尽管多模式大型语言模型（MLLMs）在各种任务上具有出色的能力，但它们仍然面临着重大的可信度挑战。然而，目前关于可信MLLMs评估的文献仍然有限，缺乏全面评估以提供对未来改进的深入见解。在这项工作中，我们建立了MultiTrust，这是关于MLLMs可信度的第一个全面统一基准，涵盖了真实性、安全性、鲁棒性、公平性和隐私五个主要方面。我们的基准采用严格的评估策略，既考虑了多模态风险，也包括了交叉模态影响，涵盖了32个多样化任务和自定义数据集。通过对21个现代MLLMs进行广泛实验，揭示了一些以前未探索的可信度问题和风险，突显了多模态引入的复杂性，并强调了增强它们可靠性的先进方法的必要性。例如，典型的专有模型仍然难以识别视觉上混乱的图像，并容易受到多模态越狱和对抗性攻击的影响；MLLMs更倾向于在文本中泄露隐私，并在推理中揭示意识形态和文化偏见，即使与无关图像配对，也表明多模态放大了基本LLMs的内部风险。此外，我们发布了一个可扩展的工具箱，用于标准化可信度研究，旨在促进这一重要领域的未来进展。代码和资源可在以下网址公开获取：https://multi-trust.github.io/。

English

Despite the superior capabilities of Multimodal Large Language Models (MLLMs) across diverse tasks, they still face significant trustworthiness challenges. Yet, current literature on the assessment of trustworthy MLLMs remains limited, lacking a holistic evaluation to offer thorough insights into future improvements. In this work, we establish MultiTrust, the first comprehensive and unified benchmark on the trustworthiness of MLLMs across five primary aspects: truthfulness, safety, robustness, fairness, and privacy. Our benchmark employs a rigorous evaluation strategy that addresses both multimodal risks and cross-modal impacts, encompassing 32 diverse tasks with self-curated datasets. Extensive experiments with 21 modern MLLMs reveal some previously unexplored trustworthiness issues and risks, highlighting the complexities introduced by the multimodality and underscoring the necessity for advanced methodologies to enhance their reliability. For instance, typical proprietary models still struggle with the perception of visually confusing images and are vulnerable to multimodal jailbreaking and adversarial attacks; MLLMs are more inclined to disclose privacy in text and reveal ideological and cultural biases even when paired with irrelevant images in inference, indicating that the multimodality amplifies the internal risks from base LLMs. Additionally, we release a scalable toolbox for standardized trustworthiness research, aiming to facilitate future advancements in this important field. Code and resources are publicly available at: https://multi-trust.github.io/.

多模态大型语言模型可信度基准测试：一项全面研究

Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study

摘要

Support