ChatPaper.aiChatPaper

GeoWorld:几何世界模型

GeoWorld: Geometric World Models

February 26, 2026
作者: Zeyu Zhang, Danning Li, Ian Reid, Richard Hartley
cs.AI

摘要

基于能量的预测性世界模型通过潜在能量空间的推演而非像素生成,为多步视觉规划提供了强大方法。然而现有方法面临两大挑战:其一,其潜在表征通常在欧几里得空间中学习,忽略了状态间固有的几何与层次结构;其二,长时程预测能力不足,导致扩展推演中出现快速性能退化。为解决这些问题,我们提出GeoWorld——一种通过双曲JEPA将潜在表征从欧氏空间映射到双曲流形,从而保持几何结构与层次关系的几何世界模型。我们进一步引入基于能量的几何强化学习优化方法,实现双曲潜在空间中稳定的多步规划。在CrossTask和COIN数据集上的大量实验表明,相较于最先进的V-JEPA 2模型,3步规划任务中成功率提升约3%,4步规划任务中提升2%。项目网站:https://steve-zeyu-zhang.github.io/GeoWorld。
English
Energy-based predictive world models provide a powerful approach for multi-step visual planning by reasoning over latent energy landscapes rather than generating pixels. However, existing approaches face two major challenges: (i) their latent representations are typically learned in Euclidean space, neglecting the underlying geometric and hierarchical structure among states, and (ii) they struggle with long-horizon prediction, which leads to rapid degradation across extended rollouts. To address these challenges, we introduce GeoWorld, a geometric world model that preserves geometric structure and hierarchical relations through a Hyperbolic JEPA, which maps latent representations from Euclidean space onto hyperbolic manifolds. We further introduce Geometric Reinforcement Learning for energy-based optimization, enabling stable multi-step planning in hyperbolic latent space. Extensive experiments on CrossTask and COIN demonstrate around 3% SR improvement in 3-step planning and 2% SR improvement in 4-step planning compared to the state-of-the-art V-JEPA 2. Project website: https://steve-zeyu-zhang.github.io/GeoWorld.
PDF44February 28, 2026