ChatPaper.aiChatPaper

BlobCtrl:一个统一且灵活的元素级图像生成与编辑框架

BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing

March 17, 2025
作者: Yaowei Li, Lingen Li, Zhaoyang Zhang, Xiaoyu Li, Guangzhi Wang, Hongxiang Li, Xiaodong Cun, Ying Shan, Yuexian Zou
cs.AI

摘要

在数字内容创作中,元素级别的视觉操控至关重要,然而当前基于扩散模型的方法在精确性和灵活性上仍不及传统工具。本研究中,我们提出了BlobCtrl框架,它通过基于概率的blob表示统一了元素级别的生成与编辑。采用blob作为视觉基元,我们的方法有效地解耦并表达了空间位置、语义内容及身份信息,从而实现了精确的元素级操控。我们的主要贡献包括:1)一种双分支扩散架构,结合层次特征融合,实现前景与背景的无缝整合;2)自监督训练范式,配备定制数据增强与评分函数;3)可控的dropout策略,以平衡保真度与多样性。为促进进一步研究,我们引入了BlobData用于大规模训练,以及BlobBench用于系统评估。实验表明,BlobCtrl在多种元素级操控任务中表现卓越,同时保持计算效率,为精确且灵活的视觉内容创作提供了实用解决方案。项目页面:https://liyaowei-stu.github.io/project/BlobCtrl/
English
Element-level visual manipulation is essential in digital content creation, but current diffusion-based methods lack the precision and flexibility of traditional tools. In this work, we introduce BlobCtrl, a framework that unifies element-level generation and editing using a probabilistic blob-based representation. By employing blobs as visual primitives, our approach effectively decouples and represents spatial location, semantic content, and identity information, enabling precise element-level manipulation. Our key contributions include: 1) a dual-branch diffusion architecture with hierarchical feature fusion for seamless foreground-background integration; 2) a self-supervised training paradigm with tailored data augmentation and score functions; and 3) controllable dropout strategies to balance fidelity and diversity. To support further research, we introduce BlobData for large-scale training and BlobBench for systematic evaluation. Experiments show that BlobCtrl excels in various element-level manipulation tasks while maintaining computational efficiency, offering a practical solution for precise and flexible visual content creation. Project page: https://liyaowei-stu.github.io/project/BlobCtrl/

Summary

AI-Generated Summary

PDF262March 18, 2025