SCAIL-2: 종단간 인-컨텍스트 조건화를 통한 제어된 캐릭터 애니메이션 통합

초록

제어된 캐릭터 애니메이션은 구동 시퀀스에서 참조 캐릭터로 동작을 전이하는 것을 필요로 한다. 기존 연구는 동작을 나타내는 포즈 스켈레톤이나 환경을 나타내는 마스킹된 배경 등 중간 표현에 크게 의존하였으며, 이는 필연적으로 정보 손실을 초래한다. 이러한 문제를 해결하기 위해, 우리는 이러한 중간 단계를 생략하고 엔드 투 엔드 캐릭터 애니메이션을 달성하는 프레임워크인 SCAIL-2를 제안한다. 구동 비디오를 시퀀스에 직접 연결함으로써, 모델은 입력 비디오로부터 필요한 모든 시각적 정보를 얻을 수 있다. 엔드 투 엔드 데이터의 부족을 해결하기 위해, 우리는 캐릭터 애니메이션의 하위 작업을 분리된 조건과 통일한 후, 캐릭터 애니메이션의 이질적인 작업을 포함하는 엔드 투 엔드 동작 전이 데이터셋인 MotionPair-60K를 합성하는 파이프라인을 구성한다. 이러한 통일성을 달성하기 위해, 우리는 텍스트 명령어와 원시 시각적 정보를 넘어서는 소프트 가이던스로 컨텍스트 내 마스크 조건화와 모드별 RoPE를 활용한다. 세부 영역에서의 합성 데이터 불일치를 해결하기 위해, 우리는 편향 인식 DPO를 제안하여 선호 항목을 구성함으로써 오류를 완화한다. 광범위한 실험 결과, 우리의 방법이 다양한 캐릭터 애니메이션 작업에서 기존 최신 기법들을 현저히 능가함을 보여준다. 합성 데이터의 많은 부분과 모델 가중치는 프로젝트 페이지(https://teal024.github.io/SCAIL-2/)에서 공개될 예정이다.

English

Controlled character animation requires transferring motion from a driving sequence to a reference character. Prior works heavily rely on intermediate representations, including pose skeletons to represent motion or masked background to represent environment, which inevitably leads to information loss. To address this, we present SCAIL-2, an framework that bypasses those intermediates and achieves end-to-end character animation. By directly concatenating driving videos to the sequence, the model can obtain all the required visual information from the input video. To address lack of end-to-end data, we unify sub-tasks of character animation with decoupled conditions and then curate a pipeline to synthesize MotionPair-60K, an end-to-end motion transfer dataset containing heterogeneous tasks of character animation. To archive the unification, we utilize in-context mask conditioning and mode-specific RoPE as soft guidance beyond textual instructions and raw visual information. To address synthetic discrepancy in detailed regions, we propose Bias-Aware DPO to construct preference items to mitigate the errors. Extensive experiments demonstrate that our method substantially outperforms existing state-of-the-art approaches in various character animation tasks. A large subset of synthetic data as well as model weights will be released at our project page: https://teal024.github.io/SCAIL-2/.