TPDiff: 시간적 피라미드 비디오 확산 모델

초록

비디오 확산 모델의 개발은 상당한 계산 요구라는 중요한 과제를 드러냅니다. 이 과제를 완화하기 위해, 우리는 확산의 역과정이 본질적으로 엔트로피 감소 특성을 보인다는 점에 주목했습니다. 비디오 모달리티에서 프레임 간 중복성을 고려할 때, 높은 엔트로피 단계에서 전체 프레임 속도를 유지하는 것은 불필요합니다. 이러한 통찰을 바탕으로, 우리는 훈련 및 추론 효율성을 향상시키기 위한 통합 프레임워크인 TPDiff를 제안합니다. 확산 과정을 여러 단계로 나누어, 우리의 프레임워크는 확산 과정을 따라 점진적으로 프레임 속도를 증가시키며, 마지막 단계에서만 전체 프레임 속도로 동작함으로써 계산 효율성을 최적화합니다. 다단계 확산 모델을 훈련하기 위해, 우리는 전용 훈련 프레임워크인 단계별 확산을 도입했습니다. 정렬된 데이터와 노이즈 하에서 분할된 확산의 확률 흐름 상미분 방정식(ODE)을 해결함으로써, 우리의 훈련 전략은 다양한 확산 형태에 적용 가능하며, 훈련 효율성을 더욱 향상시킵니다. 포괄적인 실험 평가를 통해 우리의 방법의 일반성을 검증하였으며, 훈련 비용을 50% 절감하고 추론 효율성을 1.5배 개선한 결과를 보여주었습니다.

English

The development of video diffusion models unveils a significant challenge: the substantial computational demands. To mitigate this challenge, we note that the reverse process of diffusion exhibits an inherent entropy-reducing nature. Given the inter-frame redundancy in video modality, maintaining full frame rates in high-entropy stages is unnecessary. Based on this insight, we propose TPDiff, a unified framework to enhance training and inference efficiency. By dividing diffusion into several stages, our framework progressively increases frame rate along the diffusion process with only the last stage operating on full frame rate, thereby optimizing computational efficiency. To train the multi-stage diffusion model, we introduce a dedicated training framework: stage-wise diffusion. By solving the partitioned probability flow ordinary differential equations (ODE) of diffusion under aligned data and noise, our training strategy is applicable to various diffusion forms and further enhances training efficiency. Comprehensive experimental evaluations validate the generality of our method, demonstrating 50% reduction in training cost and 1.5x improvement in inference efficiency.

TPDiff: 시간적 피라미드 비디오 확산 모델

TPDiff: Temporal Pyramid Video Diffusion Model

초록

Support