對齊多模態大型語言模型與人類偏好:一項調查
Aligning Multimodal LLM with Human Preference: A Survey
March 18, 2025
作者: Tao Yu, Yi-Fan Zhang, Chaoyou Fu, Junkang Wu, Jinda Lu, Kun Wang, Xingyu Lu, Yunhang Shen, Guibin Zhang, Dingjie Song, Yibo Yan, Tianlong Xu, Qingsong Wen, Zhang Zhang, Yan Huang, Liang Wang, Tieniu Tan
cs.AI
摘要
大型語言模型(LLMs)能夠通過簡單的提示處理多種通用任務,而無需進行特定任務的訓練。基於LLMs構建的多模態大型語言模型(MLLMs)在處理涉及視覺、聽覺和文本數據的複雜任務方面展現了令人矚目的潛力。然而,與真實性、安全性、類人推理以及與人類偏好對齊等相關的關鍵問題仍未得到充分解決。這一差距促使了各種對齊算法的出現,每種算法針對不同的應用場景和優化目標。最近的研究表明,對齊算法是解決上述挑戰的一種強大方法。本文旨在對MLLMs的對齊算法進行全面而系統的綜述。具體而言,我們探討了四個關鍵方面:(1)對齊算法涵蓋的應用場景,包括通用圖像理解、多圖像、視頻和音頻,以及擴展的多模態應用;(2)構建對齊數據集的核心因素,包括數據來源、模型響應和偏好註釋;(3)用於評估對齊算法的基準;(4)對對齊算法未來發展潛在方向的討論。本工作旨在幫助研究者梳理該領域的最新進展,並激發更好的對齊方法。本文的項目頁面可在https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Alignment 獲取。
English
Large language models (LLMs) can handle a wide variety of general tasks with
simple prompts, without the need for task-specific training. Multimodal Large
Language Models (MLLMs), built upon LLMs, have demonstrated impressive
potential in tackling complex tasks involving visual, auditory, and textual
data. However, critical issues related to truthfulness, safety, o1-like
reasoning, and alignment with human preference remain insufficiently addressed.
This gap has spurred the emergence of various alignment algorithms, each
targeting different application scenarios and optimization goals. Recent
studies have shown that alignment algorithms are a powerful approach to
resolving the aforementioned challenges. In this paper, we aim to provide a
comprehensive and systematic review of alignment algorithms for MLLMs.
Specifically, we explore four key aspects: (1) the application scenarios
covered by alignment algorithms, including general image understanding,
multi-image, video, and audio, and extended multimodal applications; (2) the
core factors in constructing alignment datasets, including data sources, model
responses, and preference annotations; (3) the benchmarks used to evaluate
alignment algorithms; and (4) a discussion of potential future directions for
the development of alignment algorithms. This work seeks to help researchers
organize current advancements in the field and inspire better alignment
methods. The project page of this paper is available at
https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Alignment.Summary
AI-Generated Summary