EverAnimate: 잠재 흐름 복원을 통한 분 단위 인간 애니메이션

초록

본 연구에서는 시각적 품질과 캐릭터 정체성을 유지하는 장기 애니메이션 비디오 생성을 위한 효율적인 사후 훈련 방법인 EverAnimate를 제안한다. 장편 애니메이션은 상대적으로 정적인 환경에서 역동적인 인간 움직임을 합성해야 하기 때문에 여전히 어려운 과제이며, 이로 인해 청크 기반 생성은 누적 드리프트가 발생하기 쉽다: (i) 정적 배경의 점진적 저하와 같은 저수준 품질 드리프트, (ii) 일관되지 않은 캐릭터 정체성 및 시점 의존적 속성과 같은 고수준 의미 드리프트. 이 문제를 해결하기 위해 EverAnimate는 두 가지 상호 보완적 메커니즘으로 구성된 지속적 잠재 컨텍스트 메모리에 생성을 고정하여 드리프트된 흐름 궤적을 복원한다. (i) 지속적 잠재 전파는 청크 간 컨텍스트 메모리를 유지하여 시간적 망각을 완화하면서 잠재 공간에서 정체성과 움직임을 전파한다. (ii) 복원적 흐름 매칭은 속도 조정을 통해 샘플링 중 암시적 복원 목표를 도입하여 청크 내 충실도를 향상시킨다. 경량 LoRA 튜닝만으로도 EverAnimate는 단기 및 장기 설정 모두에서 최첨단 장기 애니메이션 방법보다 우수한 성능을 보인다: 10초에서는 PSNR/SSIM이 8%/7% 향상되고 LPIPS/FID가 22%/11% 감소한다; 90초에서는 각각 15%/15% 및 32%/27%로 개선 폭이 증가한다.

English

We propose EverAnimate, an efficient post-training method for long-horizon animated video generation that preserves visual quality and character identity. Long-form animation remains challenging because highly dynamic human motion must be synthesized against relatively static environments, making chunk-based generation prone to accumulated drift: (i) low-level quality drift, such as progressive degradation of static backgrounds, and (ii) high-level semantic drift, such as inconsistent character identity and view-dependent attributes. To address this issue, EverAnimate restores drifted flow trajectories by anchoring generation to a persistent latent context memory, consisting of two complementary mechanisms. (i) Persistent Latent Propagation maintains a context memory across chunks to propagate identity and motion in latent space while mitigating temporal forgetting. (ii) Restorative Flow Matching introduces an implicit restoration objective during sampling through velocity adjustment, improving within-chunk fidelity. With only lightweight LoRA tuning, EverAnimate outperforms state-of-the-art long-animation methods in both short- and long-horizon settings: at 10 seconds, it improves PSNR/SSIM by 8%/7% and reduces LPIPS/FID by 22%/11%; at 90 seconds, the gains increase to 15%/15% and 32%/27%, respectively.