ChatPaper.aiChatPaper

稳健性R1:面向鲁棒视觉理解的退化感知推理

Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding

December 19, 2025
作者: Jiaqi Tang, Jianmin Chen, Wei Wei, Xiaogang Xu, Runtao Liu, Xiangyu Wu, Qipeng Xie, Jiafei Wu, Lei Zhang, Qifeng Chen
cs.AI

摘要

多模态大语言模型在极端现实世界视觉退化场景下难以保持稳定性能,这严重制约了其实际应用的鲁棒性。现有鲁棒性MLLM主要依赖仅关注视觉编码器泛化的隐式训练/适配方法,存在可解释性有限与孤立优化的问题。为突破这些局限,我们提出Robust-R1创新框架,通过结构化推理链显式建模视觉退化过程。该框架整合三大核心机制:(一)基于监督微调的退化感知推理基础构建;(二)面向退化参数精准感知的奖励驱动对齐机制;(三)适配退化强度的动态推理深度缩放。为支撑该方法,我们专门构建包含11K样本的数据集,其模拟现实世界四个关键视觉处理阶段生成的逼真退化效果,每个样本均标注连接退化参数、感知影响、原始语义推理链及结论的结构化链条。全面实验表明:Robust-R1在真实退化基准R-Bench上超越所有通用及鲁棒性基线模型,同时在MMMB、MMStar和RealWorldQA基准的多强度对抗性退化测试中保持卓越的抗退化性能,实现了当前最先进的鲁棒性表现。
English
Multimodal Large Language Models struggle to maintain reliable performance under extreme real-world visual degradations, which impede their practical robustness. Existing robust MLLMs predominantly rely on implicit training/adaptation that focuses solely on visual encoder generalization, suffering from limited interpretability and isolated optimization. To overcome these limitations, we propose Robust-R1, a novel framework that explicitly models visual degradations through structured reasoning chains. Our approach integrates: (i) supervised fine-tuning for degradation-aware reasoning foundations, (ii) reward-driven alignment for accurately perceiving degradation parameters, and (iii) dynamic reasoning depth scaling adapted to degradation intensity. To facilitate this approach, we introduce a specialized 11K dataset featuring realistic degradations synthesized across four critical real-world visual processing stages, each annotated with structured chains connecting degradation parameters, perceptual influence, pristine semantic reasoning chain, and conclusion. Comprehensive evaluations demonstrate state-of-the-art robustness: Robust-R1 outperforms all general and robust baselines on the real-world degradation benchmark R-Bench, while maintaining superior anti-degradation performance under multi-intensity adversarial degradations on MMMB, MMStar, and RealWorldQA.
PDF41December 23, 2025