FirePlace: 3D 객체 배치를 위한 LLM 상식 추론의 기하학적 정제

초록

3D 에셋을 활용한 장면 생성은 높은 수준의 의미론적 이해와 낮은 수준의 기하학적 추론을 모두 요구하는 복잡한 과제입니다. 다중모드 대형 언어 모델(MLLMs)은 의미론적 작업에서 뛰어난 성능을 보이지만, 3D 기하학에 대한 제한된 이해로 인해 3D 장면 생성에 적용하는 데 어려움이 있습니다. 본 논문에서는 MLLMs를 객체 배치 작업에 효과적으로 활용하는 방법을 탐구합니다. 이를 위해 우리는 FirePlace라는 새로운 프레임워크를 제안하며, 이는 기존 MLLMs를 (1) 3D 기하학적 추론 및 3D 장면에서 관련 기하학적 세부 정보 추출, (2) 추출된 낮은 수준의 기하학에 대한 제약 조건 구성 및 해결, (3) 상식에 부합하는 최종 배치를 위한 가지치기에 적용합니다. 기하학적 추론과 MLLMs의 실세계 이해를 결합함으로써, 우리의 방법은 기하학적 제약 조건과 높은 수준의 의미론적 상식적 고려 사항을 모두 충족하는 객체 배치를 제안할 수 있습니다. 실험 결과, 이러한 능력 덕분에 우리의 방법은 복잡한 기하학을 가진 장면에서 객체를 더 효과적으로 배치할 수 있으며, 이전 연구의 품질을 능가함을 보여줍니다.

English

Scene generation with 3D assets presents a complex challenge, requiring both high-level semantic understanding and low-level geometric reasoning. While Multimodal Large Language Models (MLLMs) excel at semantic tasks, their application to 3D scene generation is hindered by their limited grounding on 3D geometry. In this paper, we investigate how to best work with MLLMs in an object placement task. Towards this goal, we introduce a novel framework, FirePlace, that applies existing MLLMs in (1) 3D geometric reasoning and the extraction of relevant geometric details from the 3D scene, (2) constructing and solving geometric constraints on the extracted low-level geometry, and (3) pruning for final placements that conform to common sense. By combining geometric reasoning with real-world understanding of MLLMs, our method can propose object placements that satisfy both geometric constraints as well as high-level semantic common-sense considerations. Our experiments show that these capabilities allow our method to place objects more effectively in complex scenes with intricate geometry, surpassing the quality of prior work.

FirePlace: 3D 객체 배치를 위한 LLM 상식 추론의 기하학적 정제

FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement

초록

Support