BlenderAlchemy: 비전-언어 모델을 활용한 3D 그래픽 편집

초록

그래픽 디자인은 영화 제작과 게임 디자인을 포함한 다양한 응용 분야에서 중요합니다. 고품질의 장면을 만들기 위해 디자이너들은 보통 Blender와 같은 소프트웨어에서 수 시간을 보내며, 재질 노드를 연결하는 것과 같은 작업을 수백 번 반복하고 교차해야 할 수도 있습니다. 게다가, 약간 다른 디자인 목표는 완전히 다른 작업 순서를 요구할 수 있어 자동화를 어렵게 만듭니다. 본 논문에서는 GPT-4V와 같은 Vision-Language Models(VLMs)을 활용하여 사용자의 의도를 만족시킬 수 있는 답변에 도달하기 위해 디자인 액션 공간을 지능적으로 탐색하는 시스템을 제안합니다. 구체적으로, 우리는 목표를 달성하기 위한 올바른 작업 순서를 찾기 위해 시각 기반 편집 생성기와 상태 평가기를 함께 설계했습니다. 인간 디자인 과정에서 시각적 상상력의 역할에서 영감을 받아, 우리는 VLMs의 시각적 추론 능력을 이미지 생성 모델에서 생성된 "상상된" 참조 이미지로 보완하여 추상적인 언어 설명의 시각적 근거를 제공합니다. 본 논문에서는 우리의 시스템이 텍스트 및/또는 참조 이미지에서 절차적 재질을 편집하거나 복잡한 장면에서 제품 렌더링을 위한 조명 구성을 조정하는 것과 같은 단순하지만 지루한 Blender 편집 시퀀스를 생성할 수 있다는 경험적 증거를 제시합니다.

English

Graphics design is important for various applications, including movie production and game design. To create a high-quality scene, designers usually need to spend hours in software like Blender, in which they might need to interleave and repeat operations, such as connecting material nodes, hundreds of times. Moreover, slightly different design goals may require completely different sequences, making automation difficult. In this paper, we propose a system that leverages Vision-Language Models (VLMs), like GPT-4V, to intelligently search the design action space to arrive at an answer that can satisfy a user's intent. Specifically, we design a vision-based edit generator and state evaluator to work together to find the correct sequence of actions to achieve the goal. Inspired by the role of visual imagination in the human design process, we supplement the visual reasoning capabilities of VLMs with "imagined" reference images from image-generation models, providing visual grounding of abstract language descriptions. In this paper, we provide empirical evidence suggesting our system can produce simple but tedious Blender editing sequences for tasks such as editing procedural materials from text and/or reference images, as well as adjusting lighting configurations for product renderings in complex scenes.

BlenderAlchemy: 비전-언어 모델을 활용한 3D 그래픽 편집

BlenderAlchemy: Editing 3D Graphics with Vision-Language Models

초록

Support