利用事實增強的RLHF來對齊大型多模態模型
Aligning Large Multimodal Models with Factually Augmented RLHF
September 25, 2023
作者: Zhiqing Sun, Sheng Shen, Shengcao Cao, Haotian Liu, Chunyuan Li, Yikang Shen, Chuang Gan, Liang-Yan Gui, Yu-Xiong Wang, Yiming Yang, Kurt Keutzer, Trevor Darrell
cs.AI
摘要
大型多模型(LMM)跨模態構建,兩種模態之間的不一致可能導致“幻覺”,生成與上下文中的多模態信息不符的文本輸出。為解決多模態不一致問題,我們將從文本領域中的強化學習從人類反饋(RLHF)調整到視覺語言對齊任務,要求人類標註者比較兩個回應並指出更虛幻的那個,視覺語言模型則被訓練以最大化模擬人類獎勵。我們提出了一種新的對齊算法稱為事實增強 RLHF,該算法通過額外的事實信息(如圖像標題和地面真實的多選選項)來擴充獎勵模型,從而減輕 RLHF 中的獎勵黑客現象並進一步提高性能。我們還通過以前可用的人工編寫的圖像文本對增強了 GPT-4 生成的訓練數據(用於視覺指導調整),以提高我們模型的通用能力。為了在現實情境中評估所提出的方法,我們開發了一個新的評估基準 MMHAL-BENCH,特別關注對幻覺進行懲罰。作為首個使用 RLHF 訓練的 LMM,在 LLaVA-Bench 數據集上實現了顯著改進,性能水平達到了僅次於僅文本 GPT-4 的 94%(而以前的最佳方法僅能達到 87%),在 MMHAL-BENCH 上比其他基準線提高了 60%。我們在 https://llava-rlhf.github.io 上公開了我們的代碼、模型和數據。
English
Large Multimodal Models (LMM) are built across modalities and the
misalignment between two modalities can result in "hallucination", generating
textual outputs that are not grounded by the multimodal information in context.
To address the multimodal misalignment issue, we adapt the Reinforcement
Learning from Human Feedback (RLHF) from the text domain to the task of
vision-language alignment, where human annotators are asked to compare two
responses and pinpoint the more hallucinated one, and the vision-language model
is trained to maximize the simulated human rewards. We propose a new alignment
algorithm called Factually Augmented RLHF that augments the reward model with
additional factual information such as image captions and ground-truth
multi-choice options, which alleviates the reward hacking phenomenon in RLHF
and further improves the performance. We also enhance the GPT-4-generated
training data (for vision instruction tuning) with previously available
human-written image-text pairs to improve the general capabilities of our
model. To evaluate the proposed approach in real-world scenarios, we develop a
new evaluation benchmark MMHAL-BENCH with a special focus on penalizing
hallucinations. As the first LMM trained with RLHF, our approach achieves
remarkable improvement on the LLaVA-Bench dataset with the 94% performance
level of the text-only GPT-4 (while previous best methods can only achieve the
87% level), and an improvement by 60% on MMHAL-BENCH over other baselines. We
opensource our code, model, data at https://llava-rlhf.github.io.