影片佔用模型
Video Occupancy Models
June 25, 2024
作者: Manan Tomar, Philippe Hansen-Estruch, Philip Bachman, Alex Lamb, John Langford, Matthew E. Taylor, Sergey Levine
cs.AI
摘要
我們介紹了一個新的視頻預測模型家族,旨在支持下游控制任務。我們將這些模型稱為視頻佔用模型(VOCs)。VOCs在一個緊湊的潛在空間中運作,因此無需對個別像素進行預測。與先前的潛在空間世界模型不同,VOCs直接預測未來狀態的折扣分佈,一步到位,因此無需多步推演。我們展示了在構建用於下游控制的視頻預測模型時,這兩個特性都是有益的。代碼可在https://github.com/manantomar/video-occupancy-models{github.com/manantomar/video-occupancy-models}找到。
English
We introduce a new family of video prediction models designed to support
downstream control tasks. We call these models Video Occupancy models (VOCs).
VOCs operate in a compact latent space, thus avoiding the need to make
predictions about individual pixels. Unlike prior latent-space world models,
VOCs directly predict the discounted distribution of future states in a single
step, thus avoiding the need for multistep roll-outs. We show that both
properties are beneficial when building predictive models of video for use in
downstream control. Code is available at
https://github.com/manantomar/video-occupancy-models{github.com/manantomar/video-occupancy-models}.Summary
AI-Generated Summary