Visual Para-Thinker++：視覚的推論のための単一ポリシー・マルチエージェントフレームワーク

要旨

视觉推理需要整合分布在区域、属性和关系中的证据，这使得单链推理容易陷入早期感知承诺和幻觉。我们提出Visual Para-Thinker++，一个单一策略的多智能体框架，其中共享的MLLM策略被实例化为角色条件化的主智能体、工作智能体和总结智能体。主智能体以固定的分配模式分解任务；工作智能体在上下文隔离下并行推理；总结智能体协调工作智能体的完整推理轨迹，而非对最终标签进行多数投票。共享策略通过多智能体能力注入和角色解耦的多智能体优化进行训练，为相应的token片段分配角色特定的奖励和优势，以减少协作角色之间的梯度冲突。原生推理引擎通过共享视觉前缀和KV缓存重用实现高效的多智能体推理。在V*、CountBench、RefCOCO系列和HallusionBench上，Visual Para-Thinker++持续优于单轨迹和推理时并行基线，在幻觉敏感的视觉推理上尤其取得了显著提升。

English

Visual reasoning requires integrating evidence distributed across regions, attributes, and relations, making single-chain reasoning prone to early perceptual commitment and hallucination. We propose Visual Para-Thinker++, a single-policy multi-agent framework in which one shared MLLM policy is instantiated as role-conditioned Main, Worker, and Summary Agents. The Main Agent decomposes the task with fixed allocation patterns; Worker Agents reason in parallel under context isolation; and the Summary Agent reconciles full Worker reasoning traces rather than majority-voting on final labels. The shared policy is trained by Multi-Agent Capability Injection and Role-Decoupled Multi-Agent Optimization, which assign role-specific rewards and advantages to corresponding token segments to reduce gradient conflict among collaborative roles. A native inference engine enables efficient multi-agent rollout through shared visual prefix and KV cache reuse. Across V*, CountBench, the RefCOCO family, and HallusionBench, Visual Para-Thinker++ consistently outperforms single-trajectory and inference-time parallel baselines, with especially strong gains on hallucination-sensitive visual reasoning.