追蹤萬物:通過軌跡場實現任意視頻的四維表示
Trace Anything: Representing Any Video in 4D via Trajectory Fields
October 15, 2025
作者: Xinhang Liu, Yuxi Xiao, Donny Y. Chen, Jiashi Feng, Yu-Wing Tai, Chi-Keung Tang, Bingyi Kang
cs.AI
摘要
有效的時空表徵對於建模、理解和預測視頻中的動態至關重要。視頻的基本單元——像素,隨著時間推移描繪出一條連續的三維軌跡,成為動態的基本元素。基於這一原理,我們提出將任何視頻表示為軌跡場:一種密集映射,為每一幀中的每個像素分配一個關於時間的連續三維軌跡函數。利用這種表徵,我們引入了Trace Anything,這是一個神經網絡,能夠在單次前向傳播中預測整個軌跡場。具體而言,對於每一幀中的每個像素,我們的模型預測一組控制點,這些控制點參數化了一條軌跡(即B樣條),從而得出其在任意查詢時間點的三維位置。我們在大規模四維數據上訓練了Trace Anything模型,包括來自我們新平台的數據,實驗結果表明:(i) Trace Anything在我們新的軌跡場估計基準上達到了最先進的性能,並在已建立的點追蹤基準上表現出色;(ii) 得益於其一次性通過的範式,它顯著提高了效率,無需迭代優化或輔助估計器;(iii) 它展現了多種新興能力,包括目標條件下的操控、運動預測以及時空融合。項目頁面:https://trace-anything.github.io/。
English
Effective spatio-temporal representation is fundamental to modeling,
understanding, and predicting dynamics in videos. The atomic unit of a video,
the pixel, traces a continuous 3D trajectory over time, serving as the
primitive element of dynamics. Based on this principle, we propose representing
any video as a Trajectory Field: a dense mapping that assigns a continuous 3D
trajectory function of time to each pixel in every frame. With this
representation, we introduce Trace Anything, a neural network that predicts the
entire trajectory field in a single feed-forward pass. Specifically, for each
pixel in each frame, our model predicts a set of control points that
parameterizes a trajectory (i.e., a B-spline), yielding its 3D position at
arbitrary query time instants. We trained the Trace Anything model on
large-scale 4D data, including data from our new platform, and our experiments
demonstrate that: (i) Trace Anything achieves state-of-the-art performance on
our new benchmark for trajectory field estimation and performs competitively on
established point-tracking benchmarks; (ii) it offers significant efficiency
gains thanks to its one-pass paradigm, without requiring iterative optimization
or auxiliary estimators; and (iii) it exhibits emergent abilities, including
goal-conditioned manipulation, motion forecasting, and spatio-temporal fusion.
Project page: https://trace-anything.github.io/.