ChatPaper.aiChatPaper

OmniAlign-V:邁向多模態大語言模型與人類偏好更佳對齊之路

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

February 25, 2025
作者: Xiangyu Zhao, Shengyuan Ding, Zicheng Zhang, Haian Huang, Maosong Cao, Weiyun Wang, Jiaqi Wang, Xinyu Fang, Wenhai Wang, Guangtao Zhai, Haodong Duan, Hua Yang, Kai Chen
cs.AI

摘要

近期在開源多模態大型語言模型(MLLMs)的進展中,主要集中於提升基礎能力,而在人類偏好對齊方面存在顯著空白。本文介紹了OmniAlign-V,這是一個包含20萬個高質量訓練樣本的綜合數據集,涵蓋多樣化的圖像、複雜的問題以及多種回應格式,旨在提升MLLMs與人類偏好的對齊度。我們還提出了MM-AlignBench,這是一個專門設計的人類註解基準,用於評估MLLMs與人類價值觀的對齊情況。實驗結果顯示,使用監督微調(SFT)或直接偏好優化(DPO)方法對MLLMs進行微調,不僅顯著提升了與人類偏好的對齊度,同時在標準視覺問答(VQA)基準上保持或提升了性能,確保了其基礎能力的保留。我們的數據集、基準、代碼及檢查點已發佈於https://github.com/PhoenixZ810/OmniAlign-V。
English
Recent advancements in open-source multi-modal large language models (MLLMs) have primarily focused on enhancing foundational capabilities, leaving a significant gap in human preference alignment. This paper introduces OmniAlign-V, a comprehensive dataset of 200K high-quality training samples featuring diverse images, complex questions, and varied response formats to improve MLLMs' alignment with human preferences. We also present MM-AlignBench, a human-annotated benchmark specifically designed to evaluate MLLMs' alignment with human values. Experimental results show that finetuning MLLMs with OmniAlign-V, using Supervised Fine-Tuning (SFT) or Direct Preference Optimization (DPO), significantly enhances human preference alignment while maintaining or enhancing performance on standard VQA benchmarks, preserving their fundamental capabilities. Our datasets, benchmark, code and checkpoints have been released at https://github.com/PhoenixZ810/OmniAlign-V.

Summary

AI-Generated Summary

PDF732February 26, 2025