ChatPaper.aiChatPaper

统一思考者:面向图像生成的通用推理模块核心

Unified Thinker: A General Reasoning Modular Core for Image Generation

January 6, 2026
作者: Sashuai Zhou, Qiang Zhou, Jijin Hu, Hanqing Yang, Yue Cao, Junpeng Ma, Yinchao Ma, Jun Song, Tiezheng Ge, Cheng Yu, Bo Zheng, Zhou Zhao
cs.AI

摘要

尽管高保真图像合成已取得显著进展,生成模型在遵循逻辑密集型指令时仍存在困难,暴露出持久的推理-执行鸿沟。与此同时,闭源系统(如Nano Banana)已展现出强大的推理驱动图像生成能力,凸显出当前开源模型的明显差距。我们认为弥合这一差距不仅需要更优的视觉生成器,更需要可执行推理:将高层意图分解为可直接引导生成过程的、可验证的具象化方案。为此,我们提出通用思维器——一种面向通用图像生成的任务无关推理架构,其设计为可接入多样化生成器与工作流的统一规划核心。该架构将专用思维器与图像生成器解耦,实现无需重训整个生成模型的模块化推理升级。我们进一步引入两阶段训练范式:先为思维器构建结构化规划接口,再通过强化学习将其策略锚定于像素级反馈,促使规划方案更注重视觉正确性而非文本合理性。在文本到图像生成和图像编辑上的大量实验表明,通用思维器显著提升了图像推理与生成质量。
English
Despite impressive progress in high-fidelity image synthesis, generative models still struggle with logic-intensive instruction following, exposing a persistent reasoning--execution gap. Meanwhile, closed-source systems (e.g., Nano Banana) have demonstrated strong reasoning-driven image generation, highlighting a substantial gap to current open-source models. We argue that closing this gap requires not merely better visual generators, but executable reasoning: decomposing high-level intents into grounded, verifiable plans that directly steer the generative process. To this end, we propose Unified Thinker, a task-agnostic reasoning architecture for general image generation, designed as a unified planning core that can plug into diverse generators and workflows. Unified Thinker decouples a dedicated Thinker from the image Generator, enabling modular upgrades of reasoning without retraining the entire generative model. We further introduce a two-stage training paradigm: we first build a structured planning interface for the Thinker, then apply reinforcement learning to ground its policy in pixel-level feedback, encouraging plans that optimize visual correctness over textual plausibility. Extensive experiments on text-to-image generation and image editing show that Unified Thinker substantially improves image reasoning and generation quality.
PDF11January 8, 2026