多模式LLM中對齊的理解：一項全面研究

摘要

偏好對齊已成為提升大型語言模型（LLMs）性能的關鍵組成部分，然而在多模態大型語言模型（MLLMs）中的影響相對未被充分探索。與語言模型類似，用於圖像理解任務的MLLMs面臨幻覺等挑戰。在MLLMs中，幻覺不僅可能通過陳述不正確的事實而發生，還可能通過生成與圖像內容不一致的回應而發生。MLLMs的對齊主要目標是鼓勵這些模型將回應與圖像信息更為密切地對齊。最近，多項研究引入了MLLMs的偏好數據集並檢驗了不同的對齊方法，包括直接偏好優化（DPO）和近端策略優化（PPO）。然而，由於數據集、基本模型類型和對齊方法的變化，尚不清楚這些研究中哪些具體元素對於報告的改進起到了最重要的貢獻。本文獨立分析了MLLMs中偏好對齊的每個方面。我們首先將對齊算法分為離線（如DPO）和在線（如在線DPO）兩組，並展示結合離線和在線方法可以在某些情況下提高模型的性能。我們回顧了各種已發表的多模態偏好數據集，並討論了它們構建細節如何影響模型性能。基於這些見解，我們介紹了一種稱為偏見驅動幻覺抽樣（BDHS）的創新多模態偏好數據創建方式，它既不需要額外標註也不需要外部模型，並展示它在一系列基準測試中可以達到與先前發表的多模態模型對齊工作相競爭的性能。

English

Preference alignment has become a crucial component in enhancing the performance of Large Language Models (LLMs), yet its impact in Multimodal Large Language Models (MLLMs) remains comparatively underexplored. Similar to language models, MLLMs for image understanding tasks encounter challenges like hallucination. In MLLMs, hallucination can occur not only by stating incorrect facts but also by producing responses that are inconsistent with the image content. A primary objective of alignment for MLLMs is to encourage these models to align responses more closely with image information. Recently, multiple works have introduced preference datasets for MLLMs and examined different alignment methods, including Direct Preference Optimization (DPO) and Proximal Policy Optimization (PPO). However, due to variations in datasets, base model types, and alignment methods, it remains unclear which specific elements contribute most significantly to the reported improvements in these works. In this paper, we independently analyze each aspect of preference alignment in MLLMs. We start by categorizing the alignment algorithms into two groups, offline (such as DPO), and online (such as online-DPO), and show that combining offline and online methods can improve the performance of the model in certain scenarios. We review a variety of published multimodal preference datasets and discuss how the details of their construction impact model performance. Based on these insights, we introduce a novel way of creating multimodal preference data called Bias-Driven Hallucination Sampling (BDHS) that needs neither additional annotation nor external models, and show that it can achieve competitive performance to previously published alignment work for multimodal models across a range of benchmarks.

多模式LLM中對齊的理解：一項全面研究

Understanding Alignment in Multimodal LLMs: A Comprehensive Study

摘要

Summary

Support