ChatPaper.aiChatPaper

論生成式基礎模型的可信度:指南、評估與展望

On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective

February 20, 2025
作者: Yue Huang, Chujie Gao, Siyuan Wu, Haoran Wang, Xiangqi Wang, Yujun Zhou, Yanbo Wang, Jiayi Ye, Jiawen Shi, Qihui Zhang, Yuan Li, Han Bao, Zhaoyi Liu, Tianrui Guan, Dongping Chen, Ruoxi Chen, Kehan Guo, Andy Zou, Bryan Hooi Kuen-Yew, Caiming Xiong, Elias Stengel-Eskin, Hongyang Zhang, Hongzhi Yin, Huan Zhang, Huaxiu Yao, Jaehong Yoon, Jieyu Zhang, Kai Shu, Kaijie Zhu, Ranjay Krishna, Swabha Swayamdipta, Taiwei Shi, Weijia Shi, Xiang Li, Yiwei Li, Yuexing Hao, Yuexing Hao, Zhihao Jia, Zhize Li, Xiuying Chen, Zhengzhong Tu, Xiyang Hu, Tianyi Zhou, Jieyu Zhao, Lichao Sun, Furong Huang, Or Cohen Sasson, Prasanna Sattigeri, Anka Reuel, Max Lamparth, Yue Zhao, Nouha Dziri, Yu Su, Huan Sun, Heng Ji, Chaowei Xiao, Mohit Bansal, Nitesh V. Chawla, Jian Pei, Jianfeng Gao, Michael Backes, Philip S. Yu, Neil Zhenqiang Gong, Pin-Yu Chen, Bo Li, Xiangliang Zhang
cs.AI

摘要

生成式基礎模型(GenFMs)已成為變革性的工具。然而,其廣泛應用引發了對多維度可信度的關鍵擔憂。本文提出了一個全面框架,通過三項關鍵貢獻來應對這些挑戰。首先,我們系統性地回顧了來自政府和監管機構的全球AI治理法律與政策,以及行業實踐與標準。基於此分析,我們提出了一套GenFMs的指導原則,這些原則通過廣泛的多學科合作制定,融合了技術、倫理、法律和社會視角。其次,我們介紹了TrustGen,這是首個動態基準測試平台,旨在評估多維度和多模型類型的可信度,包括文本到圖像、大型語言和視覺語言模型。TrustGen利用模塊化組件——元數據策展、測試案例生成和上下文變異——來實現適應性和迭代性評估,克服了靜態評估方法的局限。通過TrustGen,我們揭示了可信度方面的顯著進展,同時指出了持續存在的挑戰。最後,我們深入探討了可信GenFMs的挑戰與未來方向,揭示了可信度的複雜性和演變性,強調了效用與可信度之間的細微權衡,並考慮了各種下游應用,識別了持續存在的挑戰,並為未來研究提供了戰略路線圖。這項工作建立了一個推進生成式AI可信度的整體框架,為GenFMs更安全、更負責任地融入關鍵應用鋪平了道路。為促進社區的進步,我們發布了動態評估工具包。
English
Generative Foundation Models (GenFMs) have emerged as transformative tools. However, their widespread adoption raises critical concerns regarding trustworthiness across dimensions. This paper presents a comprehensive framework to address these challenges through three key contributions. First, we systematically review global AI governance laws and policies from governments and regulatory bodies, as well as industry practices and standards. Based on this analysis, we propose a set of guiding principles for GenFMs, developed through extensive multidisciplinary collaboration that integrates technical, ethical, legal, and societal perspectives. Second, we introduce TrustGen, the first dynamic benchmarking platform designed to evaluate trustworthiness across multiple dimensions and model types, including text-to-image, large language, and vision-language models. TrustGen leverages modular components--metadata curation, test case generation, and contextual variation--to enable adaptive and iterative assessments, overcoming the limitations of static evaluation methods. Using TrustGen, we reveal significant progress in trustworthiness while identifying persistent challenges. Finally, we provide an in-depth discussion of the challenges and future directions for trustworthy GenFMs, which reveals the complex, evolving nature of trustworthiness, highlighting the nuanced trade-offs between utility and trustworthiness, and consideration for various downstream applications, identifying persistent challenges and providing a strategic roadmap for future research. This work establishes a holistic framework for advancing trustworthiness in GenAI, paving the way for safer and more responsible integration of GenFMs into critical applications. To facilitate advancement in the community, we release the toolkit for dynamic evaluation.

Summary

AI-Generated Summary

PDF462February 21, 2025