Human4DiT: 4D Diffusion Transformer를 활용한 자유 시점 인간 비디오 생성

초록

우리는 단일 이미지로부터 임의의 시점에서 고품질의 시공간적 일관성을 가진 인간 동영상을 생성하는 새로운 접근 방식을 제안합니다. 우리의 프레임워크는 정확한 조건 주입을 위한 U-Net의 강점과 시점 및 시간 전반의 전역적 상관관계를 포착하기 위한 디퓨전 트랜스포머의 강점을 결합합니다. 핵심은 시점, 시간, 공간 차원에 걸쳐 주의(attention)를 분해하여 4D 공간을 효율적으로 모델링할 수 있는 계단식 4D 트랜스포머 아키텍처입니다. 인간의 정체성, 카메라 파라미터, 시간 신호를 각각의 트랜스포머에 주입함으로써 정밀한 조건 설정을 달성합니다. 이 모델을 학습시키기 위해 이미지, 동영상, 다중 시점 데이터 및 3D/4D 스캔을 아우르는 다차원 데이터셋과 다차원 학습 전략을 구축했습니다. 우리의 접근 방식은 복잡한 동작과 시점 변화에 어려움을 겪는 GAN 또는 UNet 기반 디퓨전 모델의 한계를 극복합니다. 광범위한 실험을 통해 우리의 방법이 현실적이고 일관적이며 자유 시점의 인간 동영상을 합성할 수 있음을 입증하며, 가상 현실 및 애니메이션과 같은 분야에서 고급 멀티미디어 애플리케이션의 길을 열어줍니다. 프로젝트 웹사이트는 https://human4dit.github.io에서 확인할 수 있습니다.

English

We present a novel approach for generating high-quality, spatio-temporally coherent human videos from a single image under arbitrary viewpoints. Our framework combines the strengths of U-Nets for accurate condition injection and diffusion transformers for capturing global correlations across viewpoints and time. The core is a cascaded 4D transformer architecture that factorizes attention across views, time, and spatial dimensions, enabling efficient modeling of the 4D space. Precise conditioning is achieved by injecting human identity, camera parameters, and temporal signals into the respective transformers. To train this model, we curate a multi-dimensional dataset spanning images, videos, multi-view data and 3D/4D scans, along with a multi-dimensional training strategy. Our approach overcomes the limitations of previous methods based on GAN or UNet-based diffusion models, which struggle with complex motions and viewpoint changes. Through extensive experiments, we demonstrate our method's ability to synthesize realistic, coherent and free-view human videos, paving the way for advanced multimedia applications in areas such as virtual reality and animation. Our project website is https://human4dit.github.io.

Human4DiT: 4D Diffusion Transformer를 활용한 자유 시점 인간 비디오 생성

Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer

초록

Support