ChatPaper.aiChatPaper

BlobCtrl:一個統一且靈活的框架,用於元素級圖像生成與編輯

BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing

March 17, 2025
作者: Yaowei Li, Lingen Li, Zhaoyang Zhang, Xiaoyu Li, Guangzhi Wang, Hongxiang Li, Xiaodong Cun, Ying Shan, Yuexian Zou
cs.AI

摘要

元素級視覺操作在數位內容創作中至關重要,但當前基於擴散模型的方法缺乏傳統工具的精度與靈活性。在本研究中,我們提出了BlobCtrl框架,該框架利用基於概率的blob表示法,統一了元素級生成與編輯。通過將blob作為視覺基元,我們的方法有效地解耦並表示了空間位置、語義內容及身份信息,從而實現精確的元素級操作。我們的主要貢獻包括:1)採用雙分支擴散架構,結合層次特徵融合,實現前景與背景的無縫整合;2)設計了自監督訓練範式,配以定制的數據增強與評分函數;3)引入可控的dropout策略,以平衡保真度與多樣性。為支持進一步研究,我們推出了BlobData用於大規模訓練,以及BlobBench用於系統性評估。實驗表明,BlobCtrl在多種元素級操作任務中表現卓越,同時保持計算效率,為精確且靈活的視覺內容創作提供了實用解決方案。項目頁面:https://liyaowei-stu.github.io/project/BlobCtrl/
English
Element-level visual manipulation is essential in digital content creation, but current diffusion-based methods lack the precision and flexibility of traditional tools. In this work, we introduce BlobCtrl, a framework that unifies element-level generation and editing using a probabilistic blob-based representation. By employing blobs as visual primitives, our approach effectively decouples and represents spatial location, semantic content, and identity information, enabling precise element-level manipulation. Our key contributions include: 1) a dual-branch diffusion architecture with hierarchical feature fusion for seamless foreground-background integration; 2) a self-supervised training paradigm with tailored data augmentation and score functions; and 3) controllable dropout strategies to balance fidelity and diversity. To support further research, we introduce BlobData for large-scale training and BlobBench for systematic evaluation. Experiments show that BlobCtrl excels in various element-level manipulation tasks while maintaining computational efficiency, offering a practical solution for precise and flexible visual content creation. Project page: https://liyaowei-stu.github.io/project/BlobCtrl/

Summary

AI-Generated Summary

PDF262March 18, 2025