ChatPaper.aiChatPaper

PartCraft: 通过零件制作创意物体

PartCraft: Crafting Creative Objects by Parts

July 5, 2024
作者: Kam Woh Ng, Xiatian Zhu, Yi-Zhe Song, Tao Xiang
cs.AI

摘要

本文通过允许用户“选择”,推动了生成视觉人工智能中的创造性控制。我们摒弃了传统的基于文本或素描的方法,首次允许用户为其创意努力按部件选择视觉概念。其结果是精细生成,精确捕捉所选视觉概念,确保整体忠实和可信的结果。为实现这一目标,我们首先通过无监督特征聚类将对象解析为部件。然后,我们将部件编码为文本标记,并引入基于熵的归一化注意力损失来操作它们。这种损失设计使我们的模型能够学习关于对象部件组成的通用先验拓扑知识,并进一步推广到新颖的部件组合,以确保生成看起来整体忠实。最后,我们采用瓶颈编码器来投影部件标记。这不仅增强了保真度,还通过利用共享知识和促进实例间的信息交流来加快学习。论文和补充材料中的视觉结果展示了PartCraft在打造高度定制、创新作品中的引人注目力量,以“迷人”和富有创意的鸟类为例。代码已发布在 https://github.com/kamwoh/partcraft。
English
This paper propels creative control in generative visual AI by allowing users to "select". Departing from traditional text or sketch-based methods, we for the first time allow users to choose visual concepts by parts for their creative endeavors. The outcome is fine-grained generation that precisely captures selected visual concepts, ensuring a holistically faithful and plausible result. To achieve this, we first parse objects into parts through unsupervised feature clustering. Then, we encode parts into text tokens and introduce an entropy-based normalized attention loss that operates on them. This loss design enables our model to learn generic prior topology knowledge about object's part composition, and further generalize to novel part compositions to ensure the generation looks holistically faithful. Lastly, we employ a bottleneck encoder to project the part tokens. This not only enhances fidelity but also accelerates learning, by leveraging shared knowledge and facilitating information exchange among instances. Visual results in the paper and supplementary material showcase the compelling power of PartCraft in crafting highly customized, innovative creations, exemplified by the "charming" and creative birds. Code is released at https://github.com/kamwoh/partcraft.

Summary

AI-Generated Summary

PDF62November 28, 2024