ChatPaper.aiChatPaper

ComfyUI-R1:探索工作流生成中的推理模型

ComfyUI-R1: Exploring Reasoning Models for Workflow Generation

June 11, 2025
作者: Zhenran Xu, Yiyu Wang, Xue Yang, Longyue Wang, Weihua Luo, Kaifu Zhang, Baotian Hu, Min Zhang
cs.AI

摘要

AI生成内容已从单一模型发展为模块化工作流,特别是在ComfyUI等平台上,实现了创意流程的定制化。然而,构建高效的工作流需要深厚的专业知识来协调众多专门组件,这对用户来说具有较高的学习门槛。为应对这一挑战,我们推出了ComfyUI-R1,首个用于自动化工作流生成的大型推理模型。基于我们精心整理的4K工作流数据集,我们构建了长链思维(CoT)推理数据,包括节点选择、工作流规划及代码级工作流表示。ComfyUI-R1通过两阶段框架进行训练:(1)CoT微调以适应冷启动,使模型适应ComfyUI领域;(2)强化学习以激励推理能力,由细粒度规则-指标混合奖励引导,确保格式有效性、结构完整性和节点级保真度。实验表明,我们的7B参数模型实现了97%的格式有效率,同时在高通过率、节点级和图级F1分数上显著超越采用GPT-4o和Claude系列等领先闭源模型的现有最先进方法。进一步分析强调了推理过程的关键作用及将工作流转化为代码的优势。定性对比展示了我们在合成包含多样化节点的复杂工作流方面的优势,凸显了长链CoT推理在AI艺术创作中的潜力。
English
AI-generated content has evolved from monolithic models to modular workflows, particularly on platforms like ComfyUI, enabling customization in creative pipelines. However, crafting effective workflows requires great expertise to orchestrate numerous specialized components, presenting a steep learning curve for users. To address this challenge, we introduce ComfyUI-R1, the first large reasoning model for automated workflow generation. Starting with our curated dataset of 4K workflows, we construct long chain-of-thought (CoT) reasoning data, including node selection, workflow planning, and code-level workflow representation. ComfyUI-R1 is trained through a two-stage framework: (1) CoT fine-tuning for cold start, adapting models to the ComfyUI domain; (2) reinforcement learning for incentivizing reasoning capability, guided by a fine-grained rule-metric hybrid reward, ensuring format validity, structural integrity, and node-level fidelity. Experiments show that our 7B-parameter model achieves a 97\% format validity rate, along with high pass rate, node-level and graph-level F1 scores, significantly surpassing prior state-of-the-art methods that employ leading closed-source models such as GPT-4o and Claude series. Further analysis highlights the critical role of the reasoning process and the advantage of transforming workflows into code. Qualitative comparison reveals our strength in synthesizing intricate workflows with diverse nodes, underscoring the potential of long CoT reasoning in AI art creation.
PDF434June 12, 2025