GuardReasoner-VL:通過強化推理保護視覺語言模型
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning
May 16, 2025
作者: Yue Liu, Shengfang Zhai, Mingzhe Du, Yulin Chen, Tri Cao, Hongcheng Gao, Cheng Wang, Xinfeng Li, Kun Wang, Junfeng Fang, Jiaheng Zhang, Bryan Hooi
cs.AI
摘要
為提升視覺語言模型(VLM)的安全性,本文提出了一種新型基於推理的VLM守護模型,命名為GuardReasoner-VL。其核心思想是通過線上強化學習(RL)激勵守護模型在做出審核決策前進行深思熟慮的推理。首先,我們構建了GuardReasoner-VLTrain,這是一個包含123K樣本和631K推理步驟的推理語料庫,涵蓋文本、圖像及文本-圖像輸入。基於此,我們通過監督微調(SFT)冷啟動模型的推理能力。此外,我們進一步通過線上RL增強了審核相關的推理能力。具體而言,為提升樣本的多樣性和難度,我們進行了拒絕採樣,並通過提出的安全感知數據拼接進行數據增強。同時,我們採用動態剪裁參數來鼓勵早期階段的探索和後期階段的利用。為平衡性能與令牌效率,我們設計了一種長度感知的安全獎勵,整合了準確性、格式和令牌成本。大量實驗證明了我們模型的優越性。值得注意的是,其平均F1分數超越第二名達19.27%。我們在https://github.com/yueliu1999/GuardReasoner-VL/ 發布了GuardReasoner-VL的數據、代碼及模型(3B/7B)。
English
To enhance the safety of VLMs, this paper introduces a novel reasoning-based
VLM guard model dubbed GuardReasoner-VL. The core idea is to incentivize the
guard model to deliberatively reason before making moderation decisions via
online RL. First, we construct GuardReasoner-VLTrain, a reasoning corpus with
123K samples and 631K reasoning steps, spanning text, image, and text-image
inputs. Then, based on it, we cold-start our model's reasoning ability via SFT.
In addition, we further enhance reasoning regarding moderation through online
RL. Concretely, to enhance diversity and difficulty of samples, we conduct
rejection sampling followed by data augmentation via the proposed safety-aware
data concatenation. Besides, we use a dynamic clipping parameter to encourage
exploration in early stages and exploitation in later stages. To balance
performance and token efficiency, we design a length-aware safety reward that
integrates accuracy, format, and token cost. Extensive experiments demonstrate
the superiority of our model. Remarkably, it surpasses the runner-up by 19.27%
F1 score on average. We release data, code, and models (3B/7B) of
GuardReasoner-VL at https://github.com/yueliu1999/GuardReasoner-VL/Summary
AI-Generated Summary