Visual Para-Thinker++：一种用于视觉推理的单策略多智能体框架

摘要

视觉推理需要整合分布在不同区域、属性和关系中的证据，这使得单链推理容易产生过早的感知承诺和幻觉。我们提出Visual Para-Thinker++，这是一个单策略多智能体框架，其中一个共享的多模态大语言模型（MLLM）策略被实例化为角色条件化的主智能体、工作者智能体和汇总智能体。主智能体使用固定的分配模式分解任务；工作者智能体在上下文隔离下并行推理；汇总智能体整合所有工作者智能体的完整推理轨迹，而不是对最终标签进行多数投票。该共享策略通过多智能体能力注入和角色解耦的多智能体优化进行训练，为相应的标记片段分配角色特定的奖励和优势，以减少协作角色之间的梯度冲突。一个原生推理引擎通过共享视觉前缀和KV缓存重用实现高效的多智能体展开。在V*、CountBench、RefCOCO系列和HallusionBench上，Visual Para-Thinker++始终优于单轨迹和推理时并行基线方法，在幻觉敏感的视觉推理上尤其取得了显著提升。

English

Visual reasoning requires integrating evidence distributed across regions, attributes, and relations, making single-chain reasoning prone to early perceptual commitment and hallucination. We propose Visual Para-Thinker++, a single-policy multi-agent framework in which one shared MLLM policy is instantiated as role-conditioned Main, Worker, and Summary Agents. The Main Agent decomposes the task with fixed allocation patterns; Worker Agents reason in parallel under context isolation; and the Summary Agent reconciles full Worker reasoning traces rather than majority-voting on final labels. The shared policy is trained by Multi-Agent Capability Injection and Role-Decoupled Multi-Agent Optimization, which assign role-specific rewards and advantages to corresponding token segments to reduce gradient conflict among collaborative roles. A native inference engine enables efficient multi-agent rollout through shared visual prefix and KV cache reuse. Across V*, CountBench, the RefCOCO family, and HallusionBench, Visual Para-Thinker++ consistently outperforms single-trajectory and inference-time parallel baselines, with especially strong gains on hallucination-sensitive visual reasoning.