利用事实增强的RLHF对齐大型多模态模型

摘要

大型多模态模型（LMM）跨模态构建，两个模态之间的不对齐可能导致“幻觉”，生成的文本输出与上下文中的多模态信息不一致。为解决多模态不对齐问题，我们将人类反馈强化学习（RLHF）从文本领域调整到视觉-语言对齐任务，要求人类注释者比较两个响应并指出更具幻觉性的响应，视觉-语言模型被训练以最大化模拟人类奖励。我们提出一种名为增强事实RLHF的新对齐算法，通过附加额外的事实信息，如图像标题和地面真实多选项，缓解RLHF中的奖励欺骗现象，并进一步提高性能。我们还通过以前可用的人类编写的图像-文本对增强GPT-4生成的训练数据（用于视觉指导调整），以提高模型的通用能力。为了在实际场景中评估所提出的方法，我们开发了一个新的评估基准MMHAL-BENCH，特别关注对幻觉的惩罚。作为第一个使用RLHF训练的LMM，在LLaVA-Bench数据集上取得了显著的改进，性能水平达到了文本专用GPT-4的94%（而以前最佳方法只能达到87%水平），在MMHAL-BENCH上比其他基准提高了60%。我们在https://llava-rlhf.github.io 开放源代码、模型和数据。

English

Large Multimodal Models (LMM) are built across modalities and the misalignment between two modalities can result in "hallucination", generating textual outputs that are not grounded by the multimodal information in context. To address the multimodal misalignment issue, we adapt the Reinforcement Learning from Human Feedback (RLHF) from the text domain to the task of vision-language alignment, where human annotators are asked to compare two responses and pinpoint the more hallucinated one, and the vision-language model is trained to maximize the simulated human rewards. We propose a new alignment algorithm called Factually Augmented RLHF that augments the reward model with additional factual information such as image captions and ground-truth multi-choice options, which alleviates the reward hacking phenomenon in RLHF and further improves the performance. We also enhance the GPT-4-generated training data (for vision instruction tuning) with previously available human-written image-text pairs to improve the general capabilities of our model. To evaluate the proposed approach in real-world scenarios, we develop a new evaluation benchmark MMHAL-BENCH with a special focus on penalizing hallucinations. As the first LMM trained with RLHF, our approach achieves remarkable improvement on the LLaVA-Bench dataset with the 94% performance level of the text-only GPT-4 (while previous best methods can only achieve the 87% level), and an improvement by 60% on MMHAL-BENCH over other baselines. We opensource our code, model, data at https://llava-rlhf.github.io.

利用事实增强的RLHF对齐大型多模态模型

Aligning Large Multimodal Models with Factually Augmented RLHF

摘要

Support