多模态LLM中对齐的理解：一项全面研究

摘要

偏好对齐已成为提升大型语言模型（LLMs）性能的关键组成部分，然而在多模态大型语言模型（MLLMs）中的影响相对未被深入探讨。与语言模型类似，用于图像理解任务的MLLMs面临幻觉等挑战。在MLLMs中，幻觉不仅可能导致陈述不准确的事实，还可能产生与图像内容不一致的响应。MLLMs对齐的主要目标是鼓励这些模型将响应与图像信息更紧密地对齐。最近，多项研究引入了用于MLLMs的偏好数据集，并研究了不同的对齐方法，包括直接偏好优化（DPO）和近端策略优化（PPO）。然而，由于数据集、基础模型类型和对齐方法的差异，目前尚不清楚这些研究中哪些具体因素对所报道的改进起到了最重要的贡献。本文独立分析了MLLMs偏好对齐的每个方面。我们首先将对齐算法分为两组，离线（如DPO）和在线（如在线-DPO），并展示结合离线和在线方法可以在某些场景中提升模型性能。我们回顾了各种已发表的多模态偏好数据集，并讨论它们构建细节如何影响模型性能。基于这些见解，我们引入了一种称为偏见驱动幻觉采样（BDHS）的创新多模态偏好数据创建方式，既不需要额外标注也不需要外部模型，并展示它在一系列基准测试中可以实现与先前发表的多模态模型对齐工作具有竞争力的性能。

English

Preference alignment has become a crucial component in enhancing the performance of Large Language Models (LLMs), yet its impact in Multimodal Large Language Models (MLLMs) remains comparatively underexplored. Similar to language models, MLLMs for image understanding tasks encounter challenges like hallucination. In MLLMs, hallucination can occur not only by stating incorrect facts but also by producing responses that are inconsistent with the image content. A primary objective of alignment for MLLMs is to encourage these models to align responses more closely with image information. Recently, multiple works have introduced preference datasets for MLLMs and examined different alignment methods, including Direct Preference Optimization (DPO) and Proximal Policy Optimization (PPO). However, due to variations in datasets, base model types, and alignment methods, it remains unclear which specific elements contribute most significantly to the reported improvements in these works. In this paper, we independently analyze each aspect of preference alignment in MLLMs. We start by categorizing the alignment algorithms into two groups, offline (such as DPO), and online (such as online-DPO), and show that combining offline and online methods can improve the performance of the model in certain scenarios. We review a variety of published multimodal preference datasets and discuss how the details of their construction impact model performance. Based on these insights, we introduce a novel way of creating multimodal preference data called Bias-Driven Hallucination Sampling (BDHS) that needs neither additional annotation nor external models, and show that it can achieve competitive performance to previously published alignment work for multimodal models across a range of benchmarks.

多模态LLM中对齐的理解：一项全面研究

Understanding Alignment in Multimodal LLMs: A Comprehensive Study

摘要

Summary

Support