高速LeWorldModel

要旨

ジョイントエンベディング予測アーキテクチャ（JEPA）、特に最近のLeWorldModel（LeWM）は、再構築を伴わない視覚的世界モデルの有望な基盤となっている。しかし、視覚的計画において、LeWMは局所的な一段階潜在遷移モデルを繰り返し適用することで候補行動系列を評価する。この自己回帰的なロールアウトにより、計画の計算コストが高くなり、予測軌跡が地平線が長くなるにつれて累積される潜在誤差にさらされる。我々は、繰り返しの局所ロールアウトを行動プレフィックス予測に置き換える高速潜在世界モデル、Fast LeWorldModel（Fast-LeWM）を提案する。現在の潜在状態と候補行動系列が与えられると、Fast-LeWMはそのプレフィックスをエンコードし、それらのプレフィックスを実行した後に到達する未来の潜在状態を並列に予測する。行動プレフィックスを基本予測単位とすることで、Fast-LeWMは複数の地平線にわたって異なる程度に累積される行動効果を直接モデル化する。このプレフィックスレベルの監視により、モデルは一段階の状態遷移のみを適合させるのではなく、異なる行動プレフィックスの下で状態が連続的にどのように進化するかを学習することを強制される。計画中、予測器はエンコードされた行動系列から最後のプレフィックストークンを使用して、中間の想像上の状態を明示的にロールアウトすることなく、対応する未来の潜在状態を評価できる。複数のタスクにおいて、Fast-LeWMはLeWMよりも平均成功率を向上させると同時に計画時間を大幅に削減し、ロールアウト地平線が増加するにつれて成長が著しく遅くなる、より低い開ループ潜在損失を達成する。

English

Joint-Embedding Predictive Architectures (JEPAs), including recent LeWorldModel (LeWM), have become a promising foundation for reconstruction-free visual world models. For visual planning, however, LeWM evaluates candidate action sequences by repeatedly applying a local one-step latent transition model. This autoregressive rollout makes planning computationally expensive and exposes the predicted trajectory to accumulated latent errors as the horizon grows. We propose Fast LeWorldModel (Fast-LeWM), a fast latent world model that replaces repeated local rollout with action-prefix prediction. Given the current latent and a candidate action sequence, Fast-LeWM encodes its prefixes and predicts the future latents reached after executing those prefixes in parallel. By making action prefixes the basic prediction unit, Fast-LeWM directly models action effects accumulated to different extents over multiple horizons. This prefix-level supervision forces the model to learn how states continuously evolve under different action prefixes, rather than only fitting one-step state transitions. During planning, the predictor can use the last prefix token from the encoded action sequence to evaluate the corresponding future latent without explicitly rolling through each intermediate imagined state. Across multiple tasks, Fast-LeWM improves average success over LeWM while substantially reducing planning time, achieving lower open-loop latent loss whose growth becomes significantly slower as the rollout horizon increases.