心绘:将自主认知搜索与推理融入图像生成
Mind-Brush: Integrating Agentic Cognitive Search and Reasoning into Image Generation
February 2, 2026
作者: Jun He, Junyan Ye, Zilong Huang, Dongzhi Jiang, Chenjue Zhang, Leqi Zhu, Renrui Zhang, Xiang Zhang, Weijia Li
cs.AI
摘要
尽管文本到图像生成已实现前所未有的逼真度,但现有模型本质上仍是静态的文本到像素解码器,往往难以捕捉用户的隐含意图。虽然新兴的统一理解-生成模型提升了意图理解能力,但在处理需要复杂知识推理的任务时仍显不足。此外,受限于静态内部先验,这些模型无法适应现实世界的动态变化。为弥补这些缺陷,我们提出Mind-Brush——一个将生成过程转化为动态知识驱动工作流的智能体框架。该框架模拟人类“思考-检索-创作”的范式,主动获取多模态证据以锚定分布外概念,并运用推理工具解析隐含的视觉约束。为系统评估这些能力,我们构建了包含500个样本的Mind-Bench综合基准,涵盖实时新闻、新兴概念及数学与地理推理等领域。大量实验表明,Mind-Brush显著增强了统一模型的能力,使Qwen-Image基线在Mind-Bench上实现从零到一的能力跃迁,同时在WISE、RISE等成熟基准测试中取得领先结果。
English
While text-to-image generation has achieved unprecedented fidelity, the vast majority of existing models function fundamentally as static text-to-pixel decoders. Consequently, they often fail to grasp implicit user intentions. Although emerging unified understanding-generation models have improved intent comprehension, they still struggle to accomplish tasks involving complex knowledge reasoning within a single model. Moreover, constrained by static internal priors, these models remain unable to adapt to the evolving dynamics of the real world. To bridge these gaps, we introduce Mind-Brush, a unified agentic framework that transforms generation into a dynamic, knowledge-driven workflow. Simulating a human-like 'think-research-create' paradigm, Mind-Brush actively retrieves multimodal evidence to ground out-of-distribution concepts and employs reasoning tools to resolve implicit visual constraints. To rigorously evaluate these capabilities, we propose Mind-Bench, a comprehensive benchmark comprising 500 distinct samples spanning real-time news, emerging concepts, and domains such as mathematical and Geo-Reasoning. Extensive experiments demonstrate that Mind-Brush significantly enhances the capabilities of unified models, realizing a zero-to-one capability leap for the Qwen-Image baseline on Mind-Bench, while achieving superior results on established benchmarks like WISE and RISE.