ChatPaper.aiChatPaper

UniReason 1.0:面向世界知识对齐图像生成与编辑的统一推理框架

UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing

February 2, 2026
作者: Dianyi Wang, Chaofan Ma, Feng Han, Size Wu, Wei Song, Yibin Wang, Zhixiong Zhang, Tianhang Wang, Siyuan Wang, Zhongyu Wei, Jiaqi Wang
cs.AI

摘要

统一多模态模型在处理需要深度推理的复杂合成任务时常面临挑战,通常将文本到图像生成与图像编辑视为孤立能力而非相互关联的推理步骤。为此,我们提出UniReason框架,通过双重推理范式将这两项任务协同整合。我们将生成任务构建为世界知识增强的规划过程以注入隐式约束,并利用编辑能力进行细粒度视觉优化,通过自我反思进一步修正视觉错误。该方法在共享表征空间内统一生成与编辑,模拟人类先规划后优化的认知流程。为支持该框架,我们系统构建了涵盖五大知识领域(如文化常识、物理定律等)的大规模推理中心数据集(约30万样本)用于规划,同时构建智能体生成的视觉自校正语料库。大量实验表明,UniReason在WISE、KrisBench和UniREditBench等推理密集型基准测试中取得先进性能,同时保持卓越的通用合成能力。
English
Unified multimodal models often struggle with complex synthesis tasks that demand deep reasoning, and typically treat text-to-image generation and image editing as isolated capabilities rather than interconnected reasoning steps. To address this, we propose UniReason, a unified framework that harmonizes these two tasks through a dual reasoning paradigm. We formulate generation as world knowledge-enhanced planning to inject implicit constraints, and leverage editing capabilities for fine-grained visual refinement to further correct visual errors via self-reflection. This approach unifies generation and editing within a shared representation, mirroring the human cognitive process of planning followed by refinement. We support this framework by systematically constructing a large-scale reasoning-centric dataset (~300k samples) covering five major knowledge domains (e.g., cultural commonsense, physics, etc.) for planning, alongside an agent-generated corpus for visual self-correction. Extensive experiments demonstrate that UniReason achieves advanced performance on reasoning-intensive benchmarks such as WISE, KrisBench and UniREditBench, while maintaining superior general synthesis capabilities.
PDF771March 12, 2026