멀티모달 LLM에서의 정렬 이해: 포괄적 연구

초록

선호도 정렬(Preference Alignment)은 대규모 언어 모델(LLMs)의 성능을 향상시키는 데 있어 중요한 요소로 자리 잡았지만, 다중모달 대규모 언어 모델(MLLMs)에서의 영향력은 상대적으로 덜 탐구된 상태입니다. 언어 모델과 유사하게, 이미지 이해 작업을 위한 MLLMs도 환각(hallucination)과 같은 문제에 직면합니다. MLLMs에서 환각은 잘못된 사실을 기술하는 것뿐만 아니라 이미지 내용과 일치하지 않는 응답을 생성하는 방식으로도 발생할 수 있습니다. MLLMs를 위한 정렬의 주요 목표 중 하나는 이러한 모델이 이미지 정보와 더욱 긴밀하게 일치하는 응답을 생성하도록 유도하는 것입니다. 최근 여러 연구에서 MLLMs를 위한 선호도 데이터셋을 소개하고, 직접 선호도 최적화(Direct Preference Optimization, DPO) 및 근위 정책 최적화(Proximal Policy Optimization, PPO)와 같은 다양한 정렬 방법을 검토했습니다. 그러나 데이터셋, 기본 모델 유형, 정렬 방법의 차이로 인해 이러한 연구에서 보고된 개선 사항에 가장 크게 기여한 구체적인 요소가 무엇인지는 여전히 명확하지 않습니다. 본 논문에서는 MLLMs의 선호도 정렬의 각 측면을 독립적으로 분석합니다. 먼저 정렬 알고리즘을 오프라인(예: DPO)과 온라인(예: 온라인-DPO) 두 그룹으로 분류하고, 특정 시나리오에서 오프라인과 온라인 방법을 결합하면 모델 성능이 향상될 수 있음을 보여줍니다. 또한, 다양한 공개된 다중모달 선호도 데이터셋을 검토하고, 데이터셋 구성의 세부 사항이 모델 성능에 미치는 영향을 논의합니다. 이러한 통찰을 바탕으로, 추가 주석이나 외부 모델이 필요 없는 새로운 다중모달 선호도 데이터 생성 방법인 편향 기반 환각 샘플링(Bias-Driven Hallucination Sampling, BDHS)을 소개합니다. 이 방법은 다양한 벤치마크에서 기존에 발표된 다중모달 모델 정렬 연구와 경쟁력 있는 성능을 달성할 수 있음을 보여줍니다.

English

Preference alignment has become a crucial component in enhancing the performance of Large Language Models (LLMs), yet its impact in Multimodal Large Language Models (MLLMs) remains comparatively underexplored. Similar to language models, MLLMs for image understanding tasks encounter challenges like hallucination. In MLLMs, hallucination can occur not only by stating incorrect facts but also by producing responses that are inconsistent with the image content. A primary objective of alignment for MLLMs is to encourage these models to align responses more closely with image information. Recently, multiple works have introduced preference datasets for MLLMs and examined different alignment methods, including Direct Preference Optimization (DPO) and Proximal Policy Optimization (PPO). However, due to variations in datasets, base model types, and alignment methods, it remains unclear which specific elements contribute most significantly to the reported improvements in these works. In this paper, we independently analyze each aspect of preference alignment in MLLMs. We start by categorizing the alignment algorithms into two groups, offline (such as DPO), and online (such as online-DPO), and show that combining offline and online methods can improve the performance of the model in certain scenarios. We review a variety of published multimodal preference datasets and discuss how the details of their construction impact model performance. Based on these insights, we introduce a novel way of creating multimodal preference data called Bias-Driven Hallucination Sampling (BDHS) that needs neither additional annotation nor external models, and show that it can achieve competitive performance to previously published alignment work for multimodal models across a range of benchmarks.

멀티모달 LLM에서의 정렬 이해: 포괄적 연구

Understanding Alignment in Multimodal LLMs: A Comprehensive Study

초록

Summary

Support