FluidWorld: 世界モデルの予測基盤としての反応拡散ダイナミクス

要旨

世界モデルは環境の未来状態を予測することを学習し、計画立案とメンタルシミュレーションを可能にする。現在のアプローチでは、学習された潜在空間で動作するTransformerベースの予測器がデフォルトとなっている。これには代償が伴う：O(N^2)の計算量と、明示的な空間的帰納バイアスの欠如である。本論文は基礎的な疑問を投げかける：予測的世界モデリングにおいて自己注意は必須なのか、あるいは他の計算基盤が同等または優れた結果を達成し得るのか？私はFluidWorldを提案する。これは概念実証的な世界モデルであり、その予測ダイナミクスは反応拡散型の偏微分方程式（PDE）によって支配される。別個のニューラルネットワーク予測器を使用する代わりに、PDEの数値積分自体が未来状態の予測を生成する。条件付けなしのUCF-101動画予測（64x64, ~80万パラメータ, 同一のエンコーダ、デコーダ、損失関数、データ）における厳密なパラメータ一致3者間アブレーション研究において、FluidWorldはTransformerベースライン（自己注意）およびConvLSTMベースライン（畳み込みリカレンス）と比較された。3モデル全てが同等の単一ステップ予測損失に収束した一方で、FluidWorldは2倍低い再構成誤差を達成し、10-15%高い空間構造保存性と18-25%高い有効次元数を有する表現を生成した。さらに決定的な点として、両ベースラインが急速に劣化する中で、一貫性のある多ステップのロールアウトを維持した。全ての実験は単一のコンシューマーグレードPC（Intel Core i5, NVIDIA RTX 4070 Ti）で実施され、大規模計算資源は一切使用していない。これらの結果は、本質的にO(N)の空間計算量、適応的計算、拡散による大域的な空間的一貫性を提供するPDEベースのダイナミクスが、世界モデリングにおける注意機構および畳み込みリカレンスの両方に対する、実行可能でパラメータ効率の高い代替手段であることを示唆している。

English

World models learn to predict future states of an environment, enabling planning and mental simulation. Current approaches default to Transformer-based predictors operating in learned latent spaces. This comes at a cost: O(N^2) computation and no explicit spatial inductive bias. This paper asks a foundational question: is self-attention necessary for predictive world modeling, or can alternative computational substrates achieve comparable or superior results? I introduce FluidWorld, a proof-of-concept world model whose predictive dynamics are governed by partial differential equations (PDEs) of reaction-diffusion type. Instead of using a separate neural network predictor, the PDE integration itself produces the future state prediction. In a strictly parameter-matched three-way ablation on unconditional UCF-101 video prediction (64x64, ~800K parameters, identical encoder, decoder, losses, and data), FluidWorld is compared against both a Transformer baseline (self-attention) and a ConvLSTM baseline (convolutional recurrence). While all three models converge to comparable single-step prediction loss, FluidWorld achieves 2x lower reconstruction error, produces representations with 10-15% higher spatial structure preservation and 18-25% more effective dimensionality, and critically maintains coherent multi-step rollouts where both baselines degrade rapidly. All experiments were conducted on a single consumer-grade PC (Intel Core i5, NVIDIA RTX 4070 Ti), without any large-scale compute. These results establish that PDE-based dynamics, which natively provide O(N) spatial complexity, adaptive computation, and global spatial coherence through diffusion, are a viable and parameter-efficient alternative to both attention and convolutional recurrence for world modeling.

FluidWorld: 世界モデルの予測基盤としての反応拡散ダイナミクス

FluidWorld: Reaction-Diffusion Dynamics as a Predictive Substrate for World Models

要旨

Support