Habitat-GS：基于动态高斯溅射的高保真导航模拟器

摘要

训练具身智能代理的关键在于仿真环境的视觉保真度与动态人体建模能力。现有模拟器主要依赖基于网格的光栅化技术，视觉真实感有限；其动态人体化身功能（若具备）也受限于网格表示法，制约了代理在真实人类活动场景中的泛化能力。我们提出Habitat-GS——基于Habitat-Sim拓展的以导航为核心的具身智能模拟器，它融合了3D高斯溅射场景渲染技术与可驱动的高斯化身，同时保持与Habitat生态系统的完全兼容。该系统通过3DGS渲染器实现实时照片级真实感渲染，支持从多源数据导入可扩展的3DGS资源。在动态人体建模方面，我们引入高斯化身模块，使每个化身既能作为逼真视觉实体，又能充当有效导航障碍物，让代理在逼真环境中学习人类感知行为。点目标导航实验表明，基于3DGS场景训练的代理具有更强的跨领域泛化能力，其中混合领域训练策略效果最佳。化身感知导航评估进一步证实高斯化身可实现有效的人类感知导航。性能基准测试验证了系统在不同场景复杂度和化身数量下的可扩展性。

English

Training embodied AI agents depends critically on the visual fidelity of simulation environments and the ability to model dynamic humans. Current simulators rely on mesh-based rasterization with limited visual realism, and their support for dynamic human avatars, where available, is constrained to mesh representations, hindering agent generalization to human-populated real-world scenarios. We present Habitat-GS, a navigation-centric embodied AI simulator extended from Habitat-Sim that integrates 3D Gaussian Splatting scene rendering and drivable gaussian avatars while maintaining full compatibility with the Habitat ecosystem. Our system implements a 3DGS renderer for real-time photorealistic rendering and supports scalable 3DGS asset import from diverse sources. For dynamic human modeling, we introduce a gaussian avatar module that enables each avatar to simultaneously serve as a photorealistic visual entity and an effective navigation obstacle, allowing agents to learn human-aware behaviors in realistic settings. Experiments on point-goal navigation demonstrate that agents trained on 3DGS scenes achieve stronger cross-domain generalization, with mixed-domain training being the most effective strategy. Evaluations on avatar-aware navigation further confirm that gaussian avatars enable effective human-aware navigation. Finally, performance benchmarks validate the system's scalability across varying scene complexity and avatar counts.