影片串流的測試時間訓練
Test-Time Training on Video Streams
July 11, 2023
作者: Renhao Wang, Yu Sun, Yossi Gandelsman, Xinlei Chen, Alexei A. Efros, Xiaolong Wang
cs.AI
摘要
先前的研究已確立測試時訓練(TTT)作為一個通用框架,可在測試時進一步改進已訓練模型。在對每個測試實例進行預測之前,模型會使用自監督任務(例如使用遮罩自編碼器進行圖像重建)對同一實例進行訓練。我們將TTT擴展到串流設置,其中多個測試實例(在我們的情況下是視頻幀)按時間順序到達。我們的擴展是在線TTT:當前模型從先前模型初始化,然後在當前幀和之前立即的一小窗口幀上進行訓練。在四個任務上,我們的在線TTT明顯優於固定模型基線,在三個現實世界的數據集上。對於實例和全景分割,相對改進分別為45%和66%。令人驚訝的是,在線TTT還優於其離線變體,後者訪問更多信息,即訓練所有幀,而不考慮時間順序的整個測試視頻。這與使用合成視頻的先前發現不同。我們將局部性概念化為在線TTT相對於離線TTT的優勢。我們通過消融和基於偏差-變異權衡的理論分析了局部性的作用。
English
Prior work has established test-time training (TTT) as a general framework to
further improve a trained model at test time. Before making a prediction on
each test instance, the model is trained on the same instance using a
self-supervised task, such as image reconstruction with masked autoencoders. We
extend TTT to the streaming setting, where multiple test instances - video
frames in our case - arrive in temporal order. Our extension is online TTT: The
current model is initialized from the previous model, then trained on the
current frame and a small window of frames immediately before. Online TTT
significantly outperforms the fixed-model baseline for four tasks, on three
real-world datasets. The relative improvement is 45% and 66% for instance and
panoptic segmentation. Surprisingly, online TTT also outperforms its offline
variant that accesses more information, training on all frames from the entire
test video regardless of temporal order. This differs from previous findings
using synthetic videos. We conceptualize locality as the advantage of online
over offline TTT. We analyze the role of locality with ablations and a theory
based on bias-variance trade-off.