心绘：将自主认知搜索与推理融入图像生成

摘要

尽管文本到图像生成已实现前所未有的逼真度，但现有模型本质上仍是静态的文本到像素解码器，往往难以捕捉用户的隐含意图。虽然新兴的统一理解-生成模型提升了意图理解能力，但在处理需要复杂知识推理的任务时仍显不足。此外，受限于静态内部先验，这些模型无法适应现实世界的动态变化。为弥补这些缺陷，我们提出Mind-Brush——一个将生成过程转化为动态知识驱动工作流的智能体框架。该框架模拟人类“思考-检索-创作”的范式，主动获取多模态证据以锚定分布外概念，并运用推理工具解析隐含的视觉约束。为系统评估这些能力，我们构建了包含500个样本的Mind-Bench综合基准，涵盖实时新闻、新兴概念及数学与地理推理等领域。大量实验表明，Mind-Brush显著增强了统一模型的能力，使Qwen-Image基线在Mind-Bench上实现从零到一的能力跃迁，同时在WISE、RISE等成熟基准测试中取得领先结果。

English

While text-to-image generation has achieved unprecedented fidelity, the vast majority of existing models function fundamentally as static text-to-pixel decoders. Consequently, they often fail to grasp implicit user intentions. Although emerging unified understanding-generation models have improved intent comprehension, they still struggle to accomplish tasks involving complex knowledge reasoning within a single model. Moreover, constrained by static internal priors, these models remain unable to adapt to the evolving dynamics of the real world. To bridge these gaps, we introduce Mind-Brush, a unified agentic framework that transforms generation into a dynamic, knowledge-driven workflow. Simulating a human-like 'think-research-create' paradigm, Mind-Brush actively retrieves multimodal evidence to ground out-of-distribution concepts and employs reasoning tools to resolve implicit visual constraints. To rigorously evaluate these capabilities, we propose Mind-Bench, a comprehensive benchmark comprising 500 distinct samples spanning real-time news, emerging concepts, and domains such as mathematical and Geo-Reasoning. Extensive experiments demonstrate that Mind-Brush significantly enhances the capabilities of unified models, realizing a zero-to-one capability leap for the Qwen-Image baseline on Mind-Bench, while achieving superior results on established benchmarks like WISE and RISE.

心绘：将自主认知搜索与推理融入图像生成

Mind-Brush: Integrating Agentic Cognitive Search and Reasoning into Image Generation

摘要

Support