BlobCtrl：一个统一且灵活的元素级图像生成与编辑框架

摘要

在数字内容创作中，元素级别的视觉操控至关重要，然而当前基于扩散模型的方法在精确性和灵活性上仍不及传统工具。本研究中，我们提出了BlobCtrl框架，它通过基于概率的blob表示统一了元素级别的生成与编辑。采用blob作为视觉基元，我们的方法有效地解耦并表达了空间位置、语义内容及身份信息，从而实现了精确的元素级操控。我们的主要贡献包括：1）一种双分支扩散架构，结合层次特征融合，实现前景与背景的无缝整合；2）自监督训练范式，配备定制数据增强与评分函数；3）可控的dropout策略，以平衡保真度与多样性。为促进进一步研究，我们引入了BlobData用于大规模训练，以及BlobBench用于系统评估。实验表明，BlobCtrl在多种元素级操控任务中表现卓越，同时保持计算效率，为精确且灵活的视觉内容创作提供了实用解决方案。项目页面：https://liyaowei-stu.github.io/project/BlobCtrl/

English

Element-level visual manipulation is essential in digital content creation, but current diffusion-based methods lack the precision and flexibility of traditional tools. In this work, we introduce BlobCtrl, a framework that unifies element-level generation and editing using a probabilistic blob-based representation. By employing blobs as visual primitives, our approach effectively decouples and represents spatial location, semantic content, and identity information, enabling precise element-level manipulation. Our key contributions include: 1) a dual-branch diffusion architecture with hierarchical feature fusion for seamless foreground-background integration; 2) a self-supervised training paradigm with tailored data augmentation and score functions; and 3) controllable dropout strategies to balance fidelity and diversity. To support further research, we introduce BlobData for large-scale training and BlobBench for systematic evaluation. Experiments show that BlobCtrl excels in various element-level manipulation tasks while maintaining computational efficiency, offering a practical solution for precise and flexible visual content creation. Project page: https://liyaowei-stu.github.io/project/BlobCtrl/

BlobCtrl：一个统一且灵活的元素级图像生成与编辑框架

BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing

摘要

Support