ChatPaper.aiChatPaper

多模态大语言模型与人类偏好对齐:研究综述

Aligning Multimodal LLM with Human Preference: A Survey

March 18, 2025
作者: Tao Yu, Yi-Fan Zhang, Chaoyou Fu, Junkang Wu, Jinda Lu, Kun Wang, Xingyu Lu, Yunhang Shen, Guibin Zhang, Dingjie Song, Yibo Yan, Tianlong Xu, Qingsong Wen, Zhang Zhang, Yan Huang, Liang Wang, Tieniu Tan
cs.AI

摘要

大型语言模型(LLMs)能够通过简单的提示处理多种通用任务,无需针对特定任务进行训练。基于LLMs构建的多模态大型语言模型(MLLMs)在应对涉及视觉、听觉和文本数据的复杂任务方面展现了显著潜力。然而,与真实性、安全性、类人推理及与人类偏好对齐相关的关键问题仍未得到充分解决。这一空白催生了多种对齐算法的出现,每种算法针对不同的应用场景和优化目标。近期研究表明,对齐算法是解决上述挑战的有效途径。本文旨在对MLLMs的对齐算法进行全面系统的综述。具体而言,我们探讨了四个关键方面:(1)对齐算法覆盖的应用场景,包括通用图像理解、多图像、视频和音频,以及扩展的多模态应用;(2)构建对齐数据集的核心要素,包括数据来源、模型响应和偏好标注;(3)用于评估对齐算法的基准测试;(4)对齐算法未来发展的潜在方向讨论。本工作旨在帮助研究者梳理该领域的当前进展,并启发更优的对齐方法。本文的项目页面可在https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Alignment访问。
English
Large language models (LLMs) can handle a wide variety of general tasks with simple prompts, without the need for task-specific training. Multimodal Large Language Models (MLLMs), built upon LLMs, have demonstrated impressive potential in tackling complex tasks involving visual, auditory, and textual data. However, critical issues related to truthfulness, safety, o1-like reasoning, and alignment with human preference remain insufficiently addressed. This gap has spurred the emergence of various alignment algorithms, each targeting different application scenarios and optimization goals. Recent studies have shown that alignment algorithms are a powerful approach to resolving the aforementioned challenges. In this paper, we aim to provide a comprehensive and systematic review of alignment algorithms for MLLMs. Specifically, we explore four key aspects: (1) the application scenarios covered by alignment algorithms, including general image understanding, multi-image, video, and audio, and extended multimodal applications; (2) the core factors in constructing alignment datasets, including data sources, model responses, and preference annotations; (3) the benchmarks used to evaluate alignment algorithms; and (4) a discussion of potential future directions for the development of alignment algorithms. This work seeks to help researchers organize current advancements in the field and inspire better alignment methods. The project page of this paper is available at https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Alignment.

Summary

AI-Generated Summary

PDF233March 19, 2025