ChatPaper.aiChatPaper

追踪万物:通过轨迹场实现任意视频的四维表示

Trace Anything: Representing Any Video in 4D via Trajectory Fields

October 15, 2025
作者: Xinhang Liu, Yuxi Xiao, Donny Y. Chen, Jiashi Feng, Yu-Wing Tai, Chi-Keung Tang, Bingyi Kang
cs.AI

摘要

有效的时空表示是建模、理解和预测视频动态的基础。视频的基本单元——像素,随时间描绘出一条连续的三维轨迹,作为动态的原始元素。基于这一原理,我们提出将任何视频表示为轨迹场:一种密集映射,为每一帧中的每个像素分配一个关于时间的连续三维轨迹函数。借助这一表示,我们引入了Trace Anything神经网络,它能在单次前向传播中预测整个轨迹场。具体而言,对于每一帧中的每个像素,我们的模型预测一组控制点,这些控制点参数化了一条轨迹(即B样条),从而在任意查询时刻给出其三维位置。我们在大规模四维数据上训练了Trace Anything模型,包括来自我们新平台的数据,实验结果表明:(i) Trace Anything在我们新提出的轨迹场估计基准上达到了最先进的性能,并在已有的点跟踪基准上表现出色;(ii) 得益于其一次性预测范式,无需迭代优化或辅助估计器,显著提升了效率;(iii) 它展现出涌现能力,包括目标条件操控、运动预测和时空融合。项目页面:https://trace-anything.github.io/。
English
Effective spatio-temporal representation is fundamental to modeling, understanding, and predicting dynamics in videos. The atomic unit of a video, the pixel, traces a continuous 3D trajectory over time, serving as the primitive element of dynamics. Based on this principle, we propose representing any video as a Trajectory Field: a dense mapping that assigns a continuous 3D trajectory function of time to each pixel in every frame. With this representation, we introduce Trace Anything, a neural network that predicts the entire trajectory field in a single feed-forward pass. Specifically, for each pixel in each frame, our model predicts a set of control points that parameterizes a trajectory (i.e., a B-spline), yielding its 3D position at arbitrary query time instants. We trained the Trace Anything model on large-scale 4D data, including data from our new platform, and our experiments demonstrate that: (i) Trace Anything achieves state-of-the-art performance on our new benchmark for trajectory field estimation and performs competitively on established point-tracking benchmarks; (ii) it offers significant efficiency gains thanks to its one-pass paradigm, without requiring iterative optimization or auxiliary estimators; and (iii) it exhibits emergent abilities, including goal-conditioned manipulation, motion forecasting, and spatio-temporal fusion. Project page: https://trace-anything.github.io/.
PDF302October 16, 2025