RLHF-V:通過精細校正的人類反饋行為對齊,朝向可信賴的MLLMs
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
December 1, 2023
作者: Tianyu Yu, Yuan Yao, Haoye Zhang, Taiwen He, Yifeng Han, Ganqu Cui, Jinyi Hu, Zhiyuan Liu, Hai-Tao Zheng, Maosong Sun, Tat-Seng Chua
cs.AI
摘要
最近,多模式大型語言模型(MLLMs)在多模式理解、推理和互動方面展示了令人印象深刻的能力。然而,現有的MLLMs普遍存在嚴重的幻覺問題,生成的文本與相關圖像的事實基礎不符。這個問題使現有的MLLMs不可信,因此在現實世界(尤其是高風險應用)中不實用。為了應對這一挑戰,我們提出了RLHF-V,通過從細粒度校正的人類反饋中增強MLLM的可信度。具體來說,RLHF-V收集人類偏好,以段落級別的校正形式對幻覺進行修正,並對人類反饋進行密集的直接偏好優化。在自動和人工評估中對五個基準進行的全面實驗表明,RLHF-V可以實現更加可信賴的MLLM行為,具有有前途的數據和計算效率。值得注意的是,使用1.4k標註數據樣本,RLHF-V將基礎MLLM的幻覺率降低了34.8%,優於使用10k標註數據訓練的同時LLaVA-RLHF。最終模型在開源MLLM中實現了最先進的可信度表現,並且在防止由於過度泛化引起的幻覺方面比GPT-4V表現更好。我們在https://github.com/RLHF-V/RLHF-V 開源了我們的代碼、模型和數據。
English
Multimodal Large Language Models (MLLMs) have recently demonstrated
impressive capabilities in multimodal understanding, reasoning, and
interaction. However, existing MLLMs prevalently suffer from serious
hallucination problems, generating text that is not factually grounded in
associated images. The problem makes existing MLLMs untrustworthy and thus
impractical in real-world (especially high-stakes) applications. To address the
challenge, we present RLHF-V, which enhances MLLM trustworthiness via behavior
alignment from fine-grained correctional human feedback. Specifically, RLHF-V
collects human preference in the form of segment-level corrections on
hallucinations, and performs dense direct preference optimization over the
human feedback. Comprehensive experiments on five benchmarks in both automatic
and human evaluation show that, RLHF-V can enable substantially more
trustworthy MLLM behaviors with promising data and computation efficiency.
Remarkably, using 1.4k annotated data samples, RLHF-V significantly reduces the
hallucination rate of the base MLLM by 34.8%, outperforming the concurrent
LLaVA-RLHF trained on 10k annotated data. The final model achieves
state-of-the-art performance in trustworthiness among open-source MLLMs, and
shows better robustness than GPT-4V in preventing hallucinations aroused from
over-generalization. We open-source our code, model, and data at
https://github.com/RLHF-V/RLHF-V.