Champ: 3D 파라메트릭 가이던스를 활용한 제어 가능하고 일관성 있는 인간 이미지 애니메이션

초록

본 연구에서는 현재의 인간 생성 기술에서 형태 정렬과 동작 안내를 강화하기 위해 잠재 확산 프레임워크 내에서 3D 인간 파라미터 모델을 활용한 인간 이미지 애니메이션 방법론을 소개한다. 이 방법론은 SMPL(Skinned Multi-Person Linear) 모델을 3D 인간 파라미터 모델로 사용하여 신체 형태와 자세의 통합된 표현을 확립한다. 이를 통해 소스 비디오에서 복잡한 인간 기하학과 동작 특성을 정확하게 포착할 수 있다. 구체적으로, SMPL 시퀀스에서 얻은 렌더링된 깊이 이미지, 노멀 맵, 그리고 의미론적 맵을 골격 기반 동작 안내와 함께 활용하여 잠재 확산 모델에 포괄적인 3D 형태와 상세한 자세 속성을 조건으로 제공한다. 다층 동작 융합 모듈은 자기 주의 메커니즘을 통합하여 공간 영역에서 형태와 동작의 잠재 표현을 융합한다. 3D 인간 파라미터 모델을 동작 안내로 표현함으로써, 참조 이미지와 소스 비디오 동작 간의 인간 신체 파라미터 형태 정렬을 수행할 수 있다. 벤치마크 데이터셋에서 수행된 실험적 평가는 이 방법론이 자세와 형태 변화를 정확하게 포착하는 고품질 인간 애니메이션을 생성하는 우수한 능력을 보여준다. 또한, 제안된 야외 데이터셋에서도 우수한 일반화 능력을 보인다. 프로젝트 페이지: https://fudan-generative-vision.github.io/champ.

English

In this study, we introduce a methodology for human image animation by leveraging a 3D human parametric model within a latent diffusion framework to enhance shape alignment and motion guidance in curernt human generative techniques. The methodology utilizes the SMPL(Skinned Multi-Person Linear) model as the 3D human parametric model to establish a unified representation of body shape and pose. This facilitates the accurate capture of intricate human geometry and motion characteristics from source videos. Specifically, we incorporate rendered depth images, normal maps, and semantic maps obtained from SMPL sequences, alongside skeleton-based motion guidance, to enrich the conditions to the latent diffusion model with comprehensive 3D shape and detailed pose attributes. A multi-layer motion fusion module, integrating self-attention mechanisms, is employed to fuse the shape and motion latent representations in the spatial domain. By representing the 3D human parametric model as the motion guidance, we can perform parametric shape alignment of the human body between the reference image and the source video motion. Experimental evaluations conducted on benchmark datasets demonstrate the methodology's superior ability to generate high-quality human animations that accurately capture both pose and shape variations. Furthermore, our approach also exhibits superior generalization capabilities on the proposed wild dataset. Project page: https://fudan-generative-vision.github.io/champ.

Champ: 3D 파라메트릭 가이던스를 활용한 제어 가능하고 일관성 있는 인간 이미지 애니메이션

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

초록

Support