ComfyUI-R1:探索工作流生成中的推理模型
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation
June 11, 2025
作者: Zhenran Xu, Yiyu Wang, Xue Yang, Longyue Wang, Weihua Luo, Kaifu Zhang, Baotian Hu, Min Zhang
cs.AI
摘要
AI生成內容已從單一模型發展至模組化工作流程,特別是在如ComfyUI等平台上,實現了創意流程中的客製化。然而,打造有效的工作流程需要深厚的專業知識來協調眾多專業組件,這對使用者而言存在較高的學習門檻。為應對這一挑戰,我們推出了ComfyUI-R1,首個用於自動化工作流程生成的大型推理模型。基於我們精心整理的4K工作流程數據集,我們構建了長鏈思維(CoT)推理數據,包括節點選擇、工作流程規劃及代碼級別的工作流程表示。ComfyUI-R1通過兩階段框架進行訓練:(1)CoT微調以適應冷啟動,使模型適應ComfyUI領域;(2)強化學習以激勵推理能力,由細粒度規則-指標混合獎勵引導,確保格式有效性、結構完整性及節點級別的真實性。實驗表明,我們的7B參數模型達到了97%的格式有效性率,並在通過率、節點級別及圖形級別的F1分數上表現優異,顯著超越了採用GPT-4o和Claude系列等領先閉源模型的先前最先進方法。進一步分析強調了推理過程的關鍵作用及將工作流程轉化為代碼的優勢。定性比較揭示了我們在合成包含多樣節點的複雜工作流程方面的優勢,凸顯了長CoT推理在AI藝術創作中的潛力。
English
AI-generated content has evolved from monolithic models to modular workflows,
particularly on platforms like ComfyUI, enabling customization in creative
pipelines. However, crafting effective workflows requires great expertise to
orchestrate numerous specialized components, presenting a steep learning curve
for users. To address this challenge, we introduce ComfyUI-R1, the first large
reasoning model for automated workflow generation. Starting with our curated
dataset of 4K workflows, we construct long chain-of-thought (CoT) reasoning
data, including node selection, workflow planning, and code-level workflow
representation. ComfyUI-R1 is trained through a two-stage framework: (1) CoT
fine-tuning for cold start, adapting models to the ComfyUI domain; (2)
reinforcement learning for incentivizing reasoning capability, guided by a
fine-grained rule-metric hybrid reward, ensuring format validity, structural
integrity, and node-level fidelity. Experiments show that our 7B-parameter
model achieves a 97\% format validity rate, along with high pass rate,
node-level and graph-level F1 scores, significantly surpassing prior
state-of-the-art methods that employ leading closed-source models such as
GPT-4o and Claude series. Further analysis highlights the critical role of the
reasoning process and the advantage of transforming workflows into code.
Qualitative comparison reveals our strength in synthesizing intricate workflows
with diverse nodes, underscoring the potential of long CoT reasoning in AI art
creation.