流体世界：以反应-扩散动力学作为世界模型的预测基底

摘要

世界模型通过学习预测环境未来状态，实现规划与心理模拟。当前方法普遍采用基于Transformer的预测器在潜在空间中进行操作，但这带来了代价：O(N²)的计算复杂度且缺乏显式空间归纳偏置。本文提出一个基础性问题：自注意力机制是否为预测性世界建模所必需？其他计算基质能否实现相当或更优的效果？我们提出FluidWorld概念验证世界模型，其预测动力学由反应-扩散型偏微分方程控制。该模型无需独立神经网络预测器，而是通过PDE积分直接生成未来状态预测。在无条件UCF-101视频预测任务中（64x64分辨率，约80万参数，采用相同编码器、解码器、损失函数及数据），我们进行了严格参数匹配的三向消融实验，将FluidWorld与Transformer基线（自注意力）和ConvLSTM基线（卷积循环）进行对比。虽然三者均收敛至相当的单步预测损失，但FluidWorld实现了2倍更低的重构误差，其表征空间结构保持度提升10-15%，有效维度增加18-25%，关键能在多步推演中保持连贯性，而两个基线模型均迅速退化。所有实验均在单台消费级PC（Intel Core i5, NVIDIA RTX 4070 Ti）上完成，未使用任何大规模算力。这些结果表明：基于PDE的动力学机制天然具备O(N)空间复杂度、自适应计算能力以及通过扩散实现的全局空间一致性，是替代注意力和卷积循环机制进行世界建模的可行且参数高效的方案。

English

World models learn to predict future states of an environment, enabling planning and mental simulation. Current approaches default to Transformer-based predictors operating in learned latent spaces. This comes at a cost: O(N^2) computation and no explicit spatial inductive bias. This paper asks a foundational question: is self-attention necessary for predictive world modeling, or can alternative computational substrates achieve comparable or superior results? I introduce FluidWorld, a proof-of-concept world model whose predictive dynamics are governed by partial differential equations (PDEs) of reaction-diffusion type. Instead of using a separate neural network predictor, the PDE integration itself produces the future state prediction. In a strictly parameter-matched three-way ablation on unconditional UCF-101 video prediction (64x64, ~800K parameters, identical encoder, decoder, losses, and data), FluidWorld is compared against both a Transformer baseline (self-attention) and a ConvLSTM baseline (convolutional recurrence). While all three models converge to comparable single-step prediction loss, FluidWorld achieves 2x lower reconstruction error, produces representations with 10-15% higher spatial structure preservation and 18-25% more effective dimensionality, and critically maintains coherent multi-step rollouts where both baselines degrade rapidly. All experiments were conducted on a single consumer-grade PC (Intel Core i5, NVIDIA RTX 4070 Ti), without any large-scale compute. These results establish that PDE-based dynamics, which natively provide O(N) spatial complexity, adaptive computation, and global spatial coherence through diffusion, are a viable and parameter-efficient alternative to both attention and convolutional recurrence for world modeling.

流体世界：以反应-扩散动力学作为世界模型的预测基底

FluidWorld: Reaction-Diffusion Dynamics as a Predictive Substrate for World Models

摘要

Support