RLHF-V: 세밀한 교정형 인간 피드백을 통한 행동 정렬로 신뢰할 수 있는 다중언어 대형 언어 모델(MLLMs)을 향하여

초록

최근 멀티모달 대형 언어 모델(MLLMs)은 멀티모달 이해, 추론 및 상호작용 분야에서 인상적인 능력을 보여주고 있습니다. 그러나 기존 MLLMs는 심각한 환각(hallucination) 문제를 보편적으로 겪고 있으며, 이는 관련 이미지에 사실적으로 근거하지 않은 텍스트를 생성하는 문제입니다. 이러한 문제는 기존 MLLMs를 신뢰할 수 없게 만들며, 특히 고위험 상황에서의 실제 적용을 어렵게 합니다. 이러한 문제를 해결하기 위해, 우리는 세밀한 수정형 인간 피드백을 통해 MLLM의 신뢰성을 강화하는 RLHF-V를 제안합니다. 구체적으로, RLHF-V는 환각에 대한 세그먼트 수준의 수정 형태로 인간 선호도를 수집하고, 이러한 인간 피드백에 대해 밀집 직접 선호 최적화(dense direct preference optimization)를 수행합니다. 자동 및 인간 평가를 포함한 5개 벤치마크에서의 포괄적인 실험 결과, RLHF-V는 데이터 및 계산 효율성을 유지하면서 상당히 더 신뢰할 수 있는 MLLM 행동을 가능하게 하는 것으로 나타났습니다. 특히, 1.4k개의 주석 데이터 샘플을 사용하여 RLHF-V는 기본 MLLM의 환각률을 34.8% 크게 감소시켰으며, 10k개의 주석 데이터로 학습된 동시대의 LLaVA-RLHF를 능가했습니다. 최종 모델은 오픈소스 MLLMs 중 신뢰성 측면에서 최첨단 성능을 달성했으며, 과도한 일반화로 인한 환각을 방지하는 데 있어 GPT-4V보다 더 나은 견고성을 보여주었습니다. 우리는 코드, 모델 및 데이터를 https://github.com/RLHF-V/RLHF-V에서 오픈소스로 공개합니다.

English

Multimodal Large Language Models (MLLMs) have recently demonstrated impressive capabilities in multimodal understanding, reasoning, and interaction. However, existing MLLMs prevalently suffer from serious hallucination problems, generating text that is not factually grounded in associated images. The problem makes existing MLLMs untrustworthy and thus impractical in real-world (especially high-stakes) applications. To address the challenge, we present RLHF-V, which enhances MLLM trustworthiness via behavior alignment from fine-grained correctional human feedback. Specifically, RLHF-V collects human preference in the form of segment-level corrections on hallucinations, and performs dense direct preference optimization over the human feedback. Comprehensive experiments on five benchmarks in both automatic and human evaluation show that, RLHF-V can enable substantially more trustworthy MLLM behaviors with promising data and computation efficiency. Remarkably, using 1.4k annotated data samples, RLHF-V significantly reduces the hallucination rate of the base MLLM by 34.8%, outperforming the concurrent LLaVA-RLHF trained on 10k annotated data. The final model achieves state-of-the-art performance in trustworthiness among open-source MLLMs, and shows better robustness than GPT-4V in preventing hallucinations aroused from over-generalization. We open-source our code, model, and data at https://github.com/RLHF-V/RLHF-V.

RLHF-V: 세밀한 교정형 인간 피드백을 통한 행동 정렬로 신뢰할 수 있는 다중언어 대형 언어 모델(MLLMs)을 향하여

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

초록

Support