ComfyGen：面向文本到图像生成的自适应工作流

摘要

文本到图像生成的实际应用已经从简单的单块模型发展到结合多个专门组件的复杂工作流程。虽然基于工作流程的方法可以提高图像质量，但制定有效的工作流程需要相当的专业知识，因为可用组件众多，它们之间存在复杂的相互依赖关系，并且它们依赖于生成提示。在这里，我们介绍了一项新颖的任务，即提示自适应工作流生成，其目标是自动为每个用户提示量身定制工作流程。我们提出了两种基于LLM的方法来解决这一任务：一种是基于调整的方法，从用户偏好数据中学习，另一种是无需训练的方法，利用LLM选择现有的流程。与单块模型或通用的与提示无关的工作流程相比，这两种方法都可以提高图像质量。我们的工作表明，依赖于提示的流预测为改善文本到图像生成质量提供了一条新途径，这与该领域中现有的研究方向相辅相成。

English

The practical use of text-to-image generation has evolved from simple, monolithic models to complex workflows that combine multiple specialized components. While workflow-based approaches can lead to improved image quality, crafting effective workflows requires significant expertise, owing to the large number of available components, their complex inter-dependence, and their dependence on the generation prompt. Here, we introduce the novel task of prompt-adaptive workflow generation, where the goal is to automatically tailor a workflow to each user prompt. We propose two LLM-based approaches to tackle this task: a tuning-based method that learns from user-preference data, and a training-free method that uses the LLM to select existing flows. Both approaches lead to improved image quality when compared to monolithic models or generic, prompt-independent workflows. Our work shows that prompt-dependent flow prediction offers a new pathway to improving text-to-image generation quality, complementing existing research directions in the field.

ComfyGen：面向文本到图像生成的自适应工作流

ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation

摘要

Support