一步一印,筑梦未来。
Envisioning the Future, One Step at a Time
April 10, 2026
作者: Stefan Andreas Baumann, Jannik Wiese, Tommaso Martorella, Mahdi M. Kalayeh, Björn Ommer
cs.AI
摘要
要精准预测复杂多元场景的演化趋势,模型需具备表征不确定性的能力、执行长序列交互模拟的能力,以及高效探索多种合理未来的能力。然而现有方法大多依赖稠密视频或潜空间预测,将大量计算资源消耗在稠密外观表征上,而非关注场景中稀疏的点轨迹运动本质。这种做法导致未来假设的大规模探索成本高昂,且在长时程、多模态运动至关重要的场景中性能受限。我们通过将开放集未来场景动态预测问题转化为稀疏点轨迹的逐步推理来解决这一难题。我们的自回归扩散模型通过局部可预测的短时状态推进轨迹演进,显式建模随时间递增的不确定性。这种以动力学为核心的表征方式可实现从单张图像快速推演数千种不同未来,并支持通过运动初始约束进行定向生成,同时保持物理合理性与长程一致性。我们进一步提出OWM基准——基于多样化的真实世界视频构建的开放集运动预测评估体系,用于衡量真实不确定性环境下轨迹分布预测的准确性与多样性。本方法在预测精度上媲美甚至超越稠密模拟器,同时实现数量级级的采样速度提升,使开放集未来预测兼具可扩展性与实用性。项目页面:http://compvis.github.io/myriad。
English
Accurately anticipating how complex, diverse scenes will evolve requires models that represent uncertainty, simulate along extended interaction chains, and efficiently explore many plausible futures. Yet most existing approaches rely on dense video or latent-space prediction, expending substantial capacity on dense appearance rather than on the underlying sparse trajectories of points in the scene. This makes large-scale exploration of future hypotheses costly and limits performance when long-horizon, multi-modal motion is essential. We address this by formulating the prediction of open-set future scene dynamics as step-wise inference over sparse point trajectories. Our autoregressive diffusion model advances these trajectories through short, locally predictable transitions, explicitly modeling the growth of uncertainty over time. This dynamics-centric representation enables fast rollout of thousands of diverse futures from a single image, optionally guided by initial constraints on motion, while maintaining physical plausibility and long-range coherence. We further introduce OWM, a benchmark for open-set motion prediction based on diverse in-the-wild videos, to evaluate accuracy and variability of predicted trajectory distributions under real-world uncertainty. Our method matches or surpasses dense simulators in predictive accuracy while achieving orders-of-magnitude higher sampling speed, making open-set future prediction both scalable and practical. Project page: http://compvis.github.io/myriad.