CubeComposer: Spatio-temporele autoregressieve 4K 360°-videogeneratie vanuit perspectiefvideo

Samenvatting

Het genereren van hoogwaardige 360° panoramische video's vanuit perspectiefinvoer is een van de cruciale toepassingen voor virtual reality (VR), waarbij hoge-resolutie video's bijzonder belangrijk zijn voor een immersieve ervaring. Bestaande methoden worden beperkt door de rekenkundige beperkingen van standaard diffusiemodellen, ondersteunen slechts native generatie met een resolutie van ≤ 1K en zijn afhankelijk van suboptimale post-superresolutie om de resolutie te verhogen. Wij introduceren CubeComposer, een nieuw spatio-temporeel autoregressief diffusiemodel dat natively 4K-resolutie 360° video's genereert. Door video's te decomponeren in kubusprojectie-representaties met zes vlakken, synthetiseert CubeComposer inhoud autoregressief volgens een goed uitgedachte spatio-temporele volgorde, waardoor de geheugenvraag wordt verminderd terwijl hoog-resolutie output mogelijk wordt gemaakt. Specifiek, om uitdagingen in multi-dimensionale autoregressie aan te pakken, stellen wij voor: (1) een spatio-temporele autoregressieve strategie die de 360° videogeneratie coördineert over kubusvlakken en tijdvensters voor coherente synthese; (2) een kubusvlak contextbeheermechanisme, uitgerust met een sparse context attention-ontwerp om de efficiëntie te verbeteren; en (3) continuïteitsbewuste technieken, inclusief kubusbewuste positionele codering, opvulling en blending om grensnaden te elimineren. Uitgebreide experimenten op benchmarkdatasets tonen aan dat CubeComposer state-of-the-art methoden overtreft in native resolutie en visuele kwaliteit, en praktische VR-toepassingsscènes ondersteunt. Projectpagina: https://lg-li.github.io/project/cubecomposer

English

Generating high-quality 360° panoramic videos from perspective input is one of the crucial applications for virtual reality (VR), whereby high-resolution videos are especially important for immersive experience. Existing methods are constrained by computational limitations of vanilla diffusion models, only supporting leq 1K resolution native generation and relying on suboptimal post super-resolution to increase resolution. We introduce CubeComposer, a novel spatio-temporal autoregressive diffusion model that natively generates 4K-resolution 360° videos. By decomposing videos into cubemap representations with six faces, CubeComposer autoregressively synthesizes content in a well-planned spatio-temporal order, reducing memory demands while enabling high-resolution output. Specifically, to address challenges in multi-dimensional autoregression, we propose: (1) a spatio-temporal autoregressive strategy that orchestrates 360° video generation across cube faces and time windows for coherent synthesis; (2) a cube face context management mechanism, equipped with a sparse context attention design to improve efficiency; and (3) continuity-aware techniques, including cube-aware positional encoding, padding, and blending to eliminate boundary seams. Extensive experiments on benchmark datasets demonstrate that CubeComposer outperforms state-of-the-art methods in native resolution and visual quality, supporting practical VR application scenarios. Project page: https://lg-li.github.io/project/cubecomposer

CubeComposer: Spatio-temporele autoregressieve 4K 360°-videogeneratie vanuit perspectiefvideo

CubeComposer: Spatio-Temporal Autoregressive 4K 360° Video Generation from Perspective Video

Samenvatting

Support