利用事实增强的RLHF对齐大型多模态模型
Aligning Large Multimodal Models with Factually Augmented RLHF
September 25, 2023
作者: Zhiqing Sun, Sheng Shen, Shengcao Cao, Haotian Liu, Chunyuan Li, Yikang Shen, Chuang Gan, Liang-Yan Gui, Yu-Xiong Wang, Yiming Yang, Kurt Keutzer, Trevor Darrell
cs.AI
摘要
大型多模态模型(LMM)跨模态构建,两个模态之间的不对齐可能导致“幻觉”,生成的文本输出与上下文中的多模态信息不一致。为解决多模态不对齐问题,我们将人类反馈强化学习(RLHF)从文本领域调整到视觉-语言对齐任务,要求人类注释者比较两个响应并指出更具幻觉性的响应,视觉-语言模型被训练以最大化模拟人类奖励。我们提出一种名为增强事实RLHF的新对齐算法,通过附加额外的事实信息,如图像标题和地面真实多选项,缓解RLHF中的奖励欺骗现象,并进一步提高性能。我们还通过以前可用的人类编写的图像-文本对增强GPT-4生成的训练数据(用于视觉指导调整),以提高模型的通用能力。为了在实际场景中评估所提出的方法,我们开发了一个新的评估基准MMHAL-BENCH,特别关注对幻觉的惩罚。作为第一个使用RLHF训练的LMM,在LLaVA-Bench数据集上取得了显著的改进,性能水平达到了文本专用GPT-4的94%(而以前最佳方法只能达到87%水平),在MMHAL-BENCH上比其他基准提高了60%。我们在https://llava-rlhf.github.io 开放源代码、模型和数据。
English
Large Multimodal Models (LMM) are built across modalities and the
misalignment between two modalities can result in "hallucination", generating
textual outputs that are not grounded by the multimodal information in context.
To address the multimodal misalignment issue, we adapt the Reinforcement
Learning from Human Feedback (RLHF) from the text domain to the task of
vision-language alignment, where human annotators are asked to compare two
responses and pinpoint the more hallucinated one, and the vision-language model
is trained to maximize the simulated human rewards. We propose a new alignment
algorithm called Factually Augmented RLHF that augments the reward model with
additional factual information such as image captions and ground-truth
multi-choice options, which alleviates the reward hacking phenomenon in RLHF
and further improves the performance. We also enhance the GPT-4-generated
training data (for vision instruction tuning) with previously available
human-written image-text pairs to improve the general capabilities of our
model. To evaluate the proposed approach in real-world scenarios, we develop a
new evaluation benchmark MMHAL-BENCH with a special focus on penalizing
hallucinations. As the first LMM trained with RLHF, our approach achieves
remarkable improvement on the LLaVA-Bench dataset with the 94% performance
level of the text-only GPT-4 (while previous best methods can only achieve the
87% level), and an improvement by 60% on MMHAL-BENCH over other baselines. We
opensource our code, model, data at https://llava-rlhf.github.io.