多模式大型語言模型的信任度基準測試:一項全面研究
Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study
June 11, 2024
作者: Yichi Zhang, Yao Huang, Yitong Sun, Chang Liu, Zhe Zhao, Zhengwei Fang, Yifan Wang, Huanran Chen, Xiao Yang, Xingxing Wei, Hang Su, Yinpeng Dong, Jun Zhu
cs.AI
摘要
儘管多模式大型語言模型(MLLMs)在各種任務中具有卓越的能力,但它們仍然面臨著重大的可信度挑戰。然而,目前關於評估可信度的MLLMs的文獻仍然有限,缺乏全面的評估以提供對未來改進的深入見解。在這項工作中,我們建立了MultiTrust,這是關於MLLMs可信度的第一個全面統一基準,涵蓋了五個主要方面:真實性、安全性、韌性、公平性和隱私性。我們的基準採用了一種嚴格的評估策略,既考慮了多模式風險,又包括了交叉模式影響,其中包含了32個不同任務的自定義數據集。對21個現代MLLMs進行了大量實驗,揭示了一些以前未曾探索的可信度問題和風險,突顯了多模式引入的複雜性,並強調了提高它們可靠性的高級方法的必要性。例如,典型的專有模型仍然難以理解視覺上混淆的圖像,容易受到多模式越獄和對抗性攻擊的影響;MLLMs更傾向於在文本中透露隱私,並在推論時即使與無關的圖像配對,也會透露意識形態和文化偏見,這表明多模式從基礎LLMs引入了內部風險。此外,我們釋出了一個可擴展的工具箱,用於標準化的可信度研究,旨在促進這一重要領域的未來進步。代碼和資源可在以下網址公開獲得:https://multi-trust.github.io/。
English
Despite the superior capabilities of Multimodal Large Language Models (MLLMs)
across diverse tasks, they still face significant trustworthiness challenges.
Yet, current literature on the assessment of trustworthy MLLMs remains limited,
lacking a holistic evaluation to offer thorough insights into future
improvements. In this work, we establish MultiTrust, the first comprehensive
and unified benchmark on the trustworthiness of MLLMs across five primary
aspects: truthfulness, safety, robustness, fairness, and privacy. Our benchmark
employs a rigorous evaluation strategy that addresses both multimodal risks and
cross-modal impacts, encompassing 32 diverse tasks with self-curated datasets.
Extensive experiments with 21 modern MLLMs reveal some previously unexplored
trustworthiness issues and risks, highlighting the complexities introduced by
the multimodality and underscoring the necessity for advanced methodologies to
enhance their reliability. For instance, typical proprietary models still
struggle with the perception of visually confusing images and are vulnerable to
multimodal jailbreaking and adversarial attacks; MLLMs are more inclined to
disclose privacy in text and reveal ideological and cultural biases even when
paired with irrelevant images in inference, indicating that the multimodality
amplifies the internal risks from base LLMs. Additionally, we release a
scalable toolbox for standardized trustworthiness research, aiming to
facilitate future advancements in this important field. Code and resources are
publicly available at: https://multi-trust.github.io/.Summary
AI-Generated Summary