OmniVerifier-M1：具有显式结构重新校准的多模态元验证器

摘要

视觉结果在多模态大语言模型中日益占据核心地位，这使得可靠且细粒度的验证对于扩展通用基础模型至关重要。本文研究了多模态元验证方法，该方法利用验证器生成的推理依据而非仅依赖决策信号，并探索如何有效将元验证反馈整合到多模态验证器训练中。我们发现两个关键结论：第一，符号化验证器输出（如边界框）作为元验证推理依据优于文本解释，能在避免依赖辅助评判模型的模型奖励的同时，实现高效的基于规则的强化学习奖励；第二，针对二元判断和元验证目标进行解耦强化学习，由于输出结构和学习动态的内在差异，其效果显著优于联合奖励优化。基于这些发现，我们训练了OmniVerifier-M1——一种采用符号化元验证和解耦强化学习的通用视觉验证器。OmniVerifier-M1提供稳健的验证和细粒度错误定位，并进一步实现了M1-TTS（一种验证器驱动的智能体式生成系统），该系统具备动态区域级自我修正能力。该方法为更可靠、可解释且细粒度的多模态验证铺平了道路，支持更安全、更可控的基础模型部署。

English

Visual outcomes are increasingly central to multimodal large language models, making reliable and fine-grained verification essential for scaling generalist foundation models. In this work, we investigate multimodal meta-verification, which leverages verifier-generated rationales rather than decision-only signals, and explore how to effectively incorporate meta-verification feedback into multimodal verifier training. We identify two key findings. First, symbolic verifier outputs (e.g., bounding boxes) outperform textual explanations as meta-verification rationales, enabling efficient rule-based reinforcement learning rewards while avoiding reliance on model-based rewards from auxiliary judge models. Second, decoupling reinforcement learning objectives for binary judgment and meta-verification substantially outperforms joint reward optimization, due to intrinsic differences in output structure and learning dynamics. Based on these insights, we train OmniVerifier-M1, a generalist visual verifier leveraging symbolic meta-verification and decoupled reinforcement learning. OmniVerifier-M1 provides robust verification and fine-grained error localization, and further enables M1-TTS, a verifier-driven agentic generation system achieving dynamic region-level self-correction. This approach paves the way for more reliable, interpretable, and fine-grained multimodal verification, supporting safer and more controllable foundation model deployment.