FluidWorld: 세계 모델의 예측 기반으로서의 반응-확산 역학

초록

월드 모델은 환경의 미래 상태를 예측하여 계획 수립과 정신적 시뮬레이션을 가능하게 합니다. 현재의 접근법은 학습된 잠재 공간에서 동작하는 Transformer 기반 예측기를 기본으로 합니다. 이에는 O(N^2) 계산 복잡도와 명시적인 공간 귀납 편향의 부재라는 대가가 따릅니다. 본 논문은 예측적 월드 모델링에 self-attention이 정말 필요한지, 아니면 다른 계산 기반이 동등하거나 더 우수한 결과를 달성할 수 있는지라는 근본적인 질문을 던집니다. 저는 개념 증명 월드 모델인 FluidWorld를 소개합니다. 이 모델의 예측 역학은 반응-확산 형 편미분방정식(PDE)에 의해 제어됩니다. 별도의 신경망 예측기를 사용하는 대신, PDE 적분 자체가 미래 상태 예측을 생성합니다. 조건부 없는 UCF-101 비디오 예측(64x64, 약 80만 개의 매개변수, 동일한 인코더, 디코더, 손실 함수 및 데이터)에 대한 매개변수를 엄격하게 일치시킨 3방향 제거 실험에서 FluidWorld는 Transformer 기준 모델(self-attention) 및 ConvLSTM 기준 모델(합성곱 순환)과 비교되었습니다. 세 모델 모두 유사한 단일 단계 예측 손실에 수렴했지만, FluidWorld는 2배 낮은 재구성 오차를 달성했으며, 공간 구조 보존율이 10-15% 더 높고 효과적 차원이 18-25% 더 많은 표현을 생성했습니다. 또한 결정적으로, 두 기준 모델이 급격히 성능이 저하되는 상황에서도 일관된 다중 단계 롤아웃을 유지했습니다. 모든 실험은 대규모 컴퓨팅 자원 없이 단일 소비자용 PC(Intel Core i5, NVIDIA RTX 4070 Ti)에서 수행되었습니다. 이러한 결과는 본질적으로 O(N) 공간 복잡도, 적응형 계산, 확산을 통한 전역적 공간 일관성을 제공하는 PDE 기반 역학이 월드 모델링을 위한 attention과 합성곱 순환 모두에 대해 실용적이고 매개변수 효율적인 대안이 됨을 입증합니다.

English

World models learn to predict future states of an environment, enabling planning and mental simulation. Current approaches default to Transformer-based predictors operating in learned latent spaces. This comes at a cost: O(N^2) computation and no explicit spatial inductive bias. This paper asks a foundational question: is self-attention necessary for predictive world modeling, or can alternative computational substrates achieve comparable or superior results? I introduce FluidWorld, a proof-of-concept world model whose predictive dynamics are governed by partial differential equations (PDEs) of reaction-diffusion type. Instead of using a separate neural network predictor, the PDE integration itself produces the future state prediction. In a strictly parameter-matched three-way ablation on unconditional UCF-101 video prediction (64x64, ~800K parameters, identical encoder, decoder, losses, and data), FluidWorld is compared against both a Transformer baseline (self-attention) and a ConvLSTM baseline (convolutional recurrence). While all three models converge to comparable single-step prediction loss, FluidWorld achieves 2x lower reconstruction error, produces representations with 10-15% higher spatial structure preservation and 18-25% more effective dimensionality, and critically maintains coherent multi-step rollouts where both baselines degrade rapidly. All experiments were conducted on a single consumer-grade PC (Intel Core i5, NVIDIA RTX 4070 Ti), without any large-scale compute. These results establish that PDE-based dynamics, which natively provide O(N) spatial complexity, adaptive computation, and global spatial coherence through diffusion, are a viable and parameter-efficient alternative to both attention and convolutional recurrence for world modeling.

FluidWorld: 세계 모델의 예측 기반으로서의 반응-확산 역학

FluidWorld: Reaction-Diffusion Dynamics as a Predictive Substrate for World Models

초록

Support