MetaSpatial:强化面向元宇宙的视觉语言模型中的三维空间推理能力
MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse
March 24, 2025
作者: Zhenyu Pan, Han Liu
cs.AI
摘要
我们推出MetaSpatial,这是首个基于强化学习(RL)的框架,旨在提升视觉语言模型(VLMs)的三维空间推理能力,实现无需硬编码优化的实时三维场景生成。MetaSpatial解决了两个核心挑战:(i)VLMs内部缺乏三维空间推理,限制了其生成逼真布局的能力;(ii)传统监督微调(SFT)在布局生成任务中的低效性,因为缺乏完美的真实标注。我们的关键创新在于一种多轮RL优化机制,该机制整合了物理感知约束和渲染图像评估,确保生成的三维布局连贯、物理合理且视觉一致。在方法论上,MetaSpatial引入了一种自适应、迭代的推理过程,VLM通过分析渲染输出在多轮中不断优化空间布局,逐步提升场景的连贯性。实证评估表明,MetaSpatial显著增强了不同规模模型的空间一致性和格式稳定性。训练后,物体放置更加真实、对齐且功能连贯,验证了RL在元宇宙、AR/VR、数字孪生及游戏开发应用中三维空间推理的有效性。我们的代码、数据和训练流程已公开于https://github.com/PzySeere/MetaSpatial。
English
We present MetaSpatial, the first reinforcement learning (RL)-based framework
designed to enhance 3D spatial reasoning in vision-language models (VLMs),
enabling real-time 3D scene generation without the need for hard-coded
optimizations. MetaSpatial addresses two core challenges: (i) the lack of
internalized 3D spatial reasoning in VLMs, which limits their ability to
generate realistic layouts, and (ii) the inefficiency of traditional supervised
fine-tuning (SFT) for layout generation tasks, as perfect ground truth
annotations are unavailable. Our key innovation is a multi-turn RL-based
optimization mechanism that integrates physics-aware constraints and rendered
image evaluations, ensuring generated 3D layouts are coherent, physically
plausible, and aesthetically consistent. Methodologically, MetaSpatial
introduces an adaptive, iterative reasoning process, where the VLM refines
spatial arrangements over multiple turns by analyzing rendered outputs,
improving scene coherence progressively. Empirical evaluations demonstrate that
MetaSpatial significantly enhances the spatial consistency and formatting
stability of various scale models. Post-training, object placements are more
realistic, aligned, and functionally coherent, validating the effectiveness of
RL for 3D spatial reasoning in metaverse, AR/VR, digital twins, and game
development applications. Our code, data, and training pipeline are publicly
available at https://github.com/PzySeere/MetaSpatial.Summary
AI-Generated Summary