ComfyMind: 트리 기반 계획 및 반응적 피드백을 통한 범용 생성 모델 연구

초록

생성 모델의 급속한 발전과 함께, 다중 모달리티 간 다양한 작업을 단일 시스템 내에서 통합하는 유망한 접근법으로서 범용 생성(general-purpose generation)이 점점 더 주목받고 있다. 이러한 진전에도 불구하고, 기존의 오픈소스 프레임워크는 구조화된 워크플로우 계획 및 실행 수준의 피드백 부족으로 인해 여전히 취약하며 복잡한 실세계 애플리케이션을 지원하는 데 어려움을 겪고 있다. 이러한 한계를 해결하기 위해, 우리는 ComfyUI 플랫폼을 기반으로 강력하고 확장 가능한 범용 생성을 가능하게 하는 협업형 AI 시스템인 ComfyMind를 제안한다. ComfyMind는 두 가지 핵심 혁신을 도입한다: 첫째, 자연어로 기술된 호출 가능한 기능 모듈로 저수준 노드 그래프를 추상화하는 Semantic Workflow Interface(SWI)로, 고수준 구성을 가능하게 하고 구조적 오류를 줄인다. 둘째, 지역화된 피드백 실행을 포함한 Search Tree Planning 메커니즘으로, 생성을 계층적 의사결정 프로세스로 모델링하고 각 단계에서 적응형 수정을 허용한다. 이러한 구성 요소들은 복잡한 생성 워크플로우의 안정성과 유연성을 향상시킨다. 우리는 ComfyMind를 세 가지 공개 벤치마크(ComfyBench, GenEval, Reason-Edit)에서 평가하며, 이는 생성, 편집, 추론 작업을 포괄한다. 결과는 ComfyMind가 기존 오픈소스 베이스라인을 지속적으로 능가하며 GPT-Image-1에 필적하는 성능을 달성함을 보여준다. ComfyMind는 오픈소스 범용 생성 AI 시스템 개발을 위한 유망한 길을 열어준다. 프로젝트 페이지: https://github.com/LitaoGuo/ComfyMind

English

With the rapid advancement of generative models, general-purpose generation has gained increasing attention as a promising approach to unify diverse tasks across modalities within a single system. Despite this progress, existing open-source frameworks often remain fragile and struggle to support complex real-world applications due to the lack of structured workflow planning and execution-level feedback. To address these limitations, we present ComfyMind, a collaborative AI system designed to enable robust and scalable general-purpose generation, built on the ComfyUI platform. ComfyMind introduces two core innovations: Semantic Workflow Interface (SWI) that abstracts low-level node graphs into callable functional modules described in natural language, enabling high-level composition and reducing structural errors; Search Tree Planning mechanism with localized feedback execution, which models generation as a hierarchical decision process and allows adaptive correction at each stage. Together, these components improve the stability and flexibility of complex generative workflows. We evaluate ComfyMind on three public benchmarks: ComfyBench, GenEval, and Reason-Edit, which span generation, editing, and reasoning tasks. Results show that ComfyMind consistently outperforms existing open-source baselines and achieves performance comparable to GPT-Image-1. ComfyMind paves a promising path for the development of open-source general-purpose generative AI systems. Project page: https://github.com/LitaoGuo/ComfyMind

ComfyMind: 트리 기반 계획 및 반응적 피드백을 통한 범용 생성 모델 연구

ComfyMind: Toward General-Purpose Generation via Tree-Based Planning and Reactive Feedback

초록

Support