BlobCtrl:一个统一且灵活的元素级图像生成与编辑框架
BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing
March 17, 2025
作者: Yaowei Li, Lingen Li, Zhaoyang Zhang, Xiaoyu Li, Guangzhi Wang, Hongxiang Li, Xiaodong Cun, Ying Shan, Yuexian Zou
cs.AI
摘要
在数字内容创作中,元素级别的视觉操控至关重要,然而当前基于扩散模型的方法在精确性和灵活性上仍不及传统工具。本研究中,我们提出了BlobCtrl框架,它通过基于概率的blob表示统一了元素级别的生成与编辑。采用blob作为视觉基元,我们的方法有效地解耦并表达了空间位置、语义内容及身份信息,从而实现了精确的元素级操控。我们的主要贡献包括:1)一种双分支扩散架构,结合层次特征融合,实现前景与背景的无缝整合;2)自监督训练范式,配备定制数据增强与评分函数;3)可控的dropout策略,以平衡保真度与多样性。为促进进一步研究,我们引入了BlobData用于大规模训练,以及BlobBench用于系统评估。实验表明,BlobCtrl在多种元素级操控任务中表现卓越,同时保持计算效率,为精确且灵活的视觉内容创作提供了实用解决方案。项目页面:https://liyaowei-stu.github.io/project/BlobCtrl/
English
Element-level visual manipulation is essential in digital content creation,
but current diffusion-based methods lack the precision and flexibility of
traditional tools. In this work, we introduce BlobCtrl, a framework that
unifies element-level generation and editing using a probabilistic blob-based
representation. By employing blobs as visual primitives, our approach
effectively decouples and represents spatial location, semantic content, and
identity information, enabling precise element-level manipulation. Our key
contributions include: 1) a dual-branch diffusion architecture with
hierarchical feature fusion for seamless foreground-background integration; 2)
a self-supervised training paradigm with tailored data augmentation and score
functions; and 3) controllable dropout strategies to balance fidelity and
diversity. To support further research, we introduce BlobData for large-scale
training and BlobBench for systematic evaluation. Experiments show that
BlobCtrl excels in various element-level manipulation tasks while maintaining
computational efficiency, offering a practical solution for precise and
flexible visual content creation. Project page:
https://liyaowei-stu.github.io/project/BlobCtrl/Summary
AI-Generated Summary