可靠与负责任的基础模型:综合评述
Reliable and Responsible Foundation Models: A Comprehensive Survey
February 4, 2026
作者: Xinyu Yang, Junlin Han, Rishi Bommasani, Jinqi Luo, Wenjie Qu, Wangchunshu Zhou, Adel Bibi, Xiyao Wang, Jaehong Yoon, Elias Stengel-Eskin, Shengbang Tong, Lingfeng Shen, Rafael Rafailov, Runjia Li, Zhaoyang Wang, Yiyang Zhou, Chenhang Cui, Yu Wang, Wenhao Zheng, Huichi Zhou, Jindong Gu, Zhaorun Chen, Peng Xia, Tony Lee, Thomas Zollo, Vikash Sehwag, Jixuan Leng, Jiuhai Chen, Yuxin Wen, Huan Zhang, Zhun Deng, Linjun Zhang, Pavel Izmailov, Pang Wei Koh, Yulia Tsvetkov, Andrew Wilson, Jiaheng Zhang, James Zou, Cihang Xie, Hao Wang, Philip Torr, Julian McAuley, David Alvarez-Melis, Florian Tramèr, Kaidi Xu, Suman Jana, Chris Callison-Burch, Rene Vidal, Filippos Kokkinos, Mohit Bansal, Beidi Chen, Huaxiu Yao
cs.AI
摘要
基础模型,包括大语言模型(LLMs)、多模态大语言模型(MLLMs)、图像生成模型(即文生图模型与图像编辑模型)以及视频生成模型,已成为法律、医疗、教育、金融、科学等众多领域不可或缺的核心工具。随着这些模型在现实场景中的广泛应用,确保其可靠性与责任性已成为学术界、产业界和政府部门关注的焦点。本综述系统探讨基础模型的可靠性与责任性发展路径,深入剖析偏见与公平性、安全与隐私、不确定性、可解释性及分布偏移等关键问题,同时涵盖模型幻觉等局限性问题,以及对齐技术、AIGC检测等解决方案。针对每个领域,我们梳理了当前研究现状并指明具体的前沿研究方向。此外,本文还探讨了这些领域之间的交叉关联,揭示其内在联系与共性挑战。我们期望通过本次综述推动基础模型向不仅强大、更具备伦理约束、可信可靠且承担社会责任的方向发展。
English
Foundation models, including Large Language Models (LLMs), Multimodal Large Language Models (MLLMs), Image Generative Models (i.e, Text-to-Image Models and Image-Editing Models), and Video Generative Models, have become essential tools with broad applications across various domains such as law, medicine, education, finance, science, and beyond. As these models see increasing real-world deployment, ensuring their reliability and responsibility has become critical for academia, industry, and government. This survey addresses the reliable and responsible development of foundation models. We explore critical issues, including bias and fairness, security and privacy, uncertainty, explainability, and distribution shift. Our research also covers model limitations, such as hallucinations, as well as methods like alignment and Artificial Intelligence-Generated Content (AIGC) detection. For each area, we review the current state of the field and outline concrete future research directions. Additionally, we discuss the intersections between these areas, highlighting their connections and shared challenges. We hope our survey fosters the development of foundation models that are not only powerful but also ethical, trustworthy, reliable, and socially responsible.