BlenderAlchemy：使用视觉语言模型编辑3D图形

摘要

图形设计对于各种应用至关重要，包括电影制作和游戏设计。为了创造高质量的场景，设计师通常需要在诸如Blender之类的软件中花费数小时，其中他们可能需要交错和重复操作，比如连接材质节点，数百次。此外，稍有不同的设计目标可能需要完全不同的序列，使自动化变得困难。在本文中，我们提出了一个系统，利用视觉-语言模型（VLMs），如GPT-4V，智能搜索设计行动空间，以得出能满足用户意图的答案。具体而言，我们设计了一个基于视觉的编辑生成器和状态评估器，共同寻找正确的行动序列以实现目标。受人类设计过程中视觉想象力的启发，我们通过从图像生成模型获取“想象”的参考图像，为VLMs的视觉推理能力提供视觉基础，从而补充了VLMs的视觉推理能力。在本文中，我们提供了实证证据，表明我们的系统可以为诸如从文本和/或参考图像编辑程序材质以及调整复杂场景中产品渲染的照明配置等任务生成简单但繁琐的Blender编辑序列。

English

Graphics design is important for various applications, including movie production and game design. To create a high-quality scene, designers usually need to spend hours in software like Blender, in which they might need to interleave and repeat operations, such as connecting material nodes, hundreds of times. Moreover, slightly different design goals may require completely different sequences, making automation difficult. In this paper, we propose a system that leverages Vision-Language Models (VLMs), like GPT-4V, to intelligently search the design action space to arrive at an answer that can satisfy a user's intent. Specifically, we design a vision-based edit generator and state evaluator to work together to find the correct sequence of actions to achieve the goal. Inspired by the role of visual imagination in the human design process, we supplement the visual reasoning capabilities of VLMs with "imagined" reference images from image-generation models, providing visual grounding of abstract language descriptions. In this paper, we provide empirical evidence suggesting our system can produce simple but tedious Blender editing sequences for tasks such as editing procedural materials from text and/or reference images, as well as adjusting lighting configurations for product renderings in complex scenes.

BlenderAlchemy：使用视觉语言模型编辑3D图形

BlenderAlchemy: Editing 3D Graphics with Vision-Language Models

摘要

Support