ChatPaper.aiChatPaper

MetaSpatial:强化面向元宇宙的视觉语言模型中的三维空间推理能力

MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse

March 24, 2025
作者: Zhenyu Pan, Han Liu
cs.AI

摘要

我们推出MetaSpatial,这是首个基于强化学习(RL)的框架,旨在提升视觉语言模型(VLMs)的三维空间推理能力,实现无需硬编码优化的实时三维场景生成。MetaSpatial解决了两个核心挑战:(i)VLMs内部缺乏三维空间推理,限制了其生成逼真布局的能力;(ii)传统监督微调(SFT)在布局生成任务中的低效性,因为缺乏完美的真实标注。我们的关键创新在于一种多轮RL优化机制,该机制整合了物理感知约束和渲染图像评估,确保生成的三维布局连贯、物理合理且视觉一致。在方法论上,MetaSpatial引入了一种自适应、迭代的推理过程,VLM通过分析渲染输出在多轮中不断优化空间布局,逐步提升场景的连贯性。实证评估表明,MetaSpatial显著增强了不同规模模型的空间一致性和格式稳定性。训练后,物体放置更加真实、对齐且功能连贯,验证了RL在元宇宙、AR/VR、数字孪生及游戏开发应用中三维空间推理的有效性。我们的代码、数据和训练流程已公开于https://github.com/PzySeere/MetaSpatial。
English
We present MetaSpatial, the first reinforcement learning (RL)-based framework designed to enhance 3D spatial reasoning in vision-language models (VLMs), enabling real-time 3D scene generation without the need for hard-coded optimizations. MetaSpatial addresses two core challenges: (i) the lack of internalized 3D spatial reasoning in VLMs, which limits their ability to generate realistic layouts, and (ii) the inefficiency of traditional supervised fine-tuning (SFT) for layout generation tasks, as perfect ground truth annotations are unavailable. Our key innovation is a multi-turn RL-based optimization mechanism that integrates physics-aware constraints and rendered image evaluations, ensuring generated 3D layouts are coherent, physically plausible, and aesthetically consistent. Methodologically, MetaSpatial introduces an adaptive, iterative reasoning process, where the VLM refines spatial arrangements over multiple turns by analyzing rendered outputs, improving scene coherence progressively. Empirical evaluations demonstrate that MetaSpatial significantly enhances the spatial consistency and formatting stability of various scale models. Post-training, object placements are more realistic, aligned, and functionally coherent, validating the effectiveness of RL for 3D spatial reasoning in metaverse, AR/VR, digital twins, and game development applications. Our code, data, and training pipeline are publicly available at https://github.com/PzySeere/MetaSpatial.

Summary

AI-Generated Summary

PDF32March 25, 2025