Kling运动控制技术报告 (注:Kling作为专有名词保留原名称,根据技术文档惯例采用首字母大写形式)
Kling-MotionControl Technical Report
March 3, 2026
作者: Kling Team, Jialu Chen, Yikang Ding, Zhixue Fang, Kun Gai, Kang He, Xu He, Jingyun Hua, Mingming Lao, Xiaohan Li, Hui Liu, Jiwen Liu, Xiaoqiang Liu, Fan Shi, Xiaoyu Shi, Peiqin Sun, Songlin Tang, Pengfei Wan, Tiancheng Wen, Zhiyong Wu, Haoxian Zhang, Runze Zhao, Yuanxing Zhang, Yan Zhou
cs.AI
摘要
角色动画旨在通过将驱动视频中的运动动态迁移至参考图像,生成栩栩如生的视频。近年来生成模型的突破为高保真角色动画开辟了新路径。本文提出Kling-MotionControl——一个基于DiT的统一框架,专为鲁棒、精准且富有表现力的整体角色动画而设计。该模型通过 cohesive 系统中的分治策略,协调针对身体、面部和手部不同特性定制的异构运动表征,有效平衡了大尺度结构稳定性与细粒度关节表现力。为确保跨身份泛化鲁棒性,我们引入自适应身份无关学习机制,实现对从真实人类到风格化卡通等多样化角色的自然运动重定向。同时通过精细的身份注入与融合设计,结合基于全景参考语境的主题库机制,确保外观特征的高度还原。为提升实用性,我们采用多阶段蒸馏的先进加速框架,将推理速度提升超10倍。Kling-MotionControl凭借智能语义运动理解与精准文本响应能力脱颖而出,可实现超越视觉输入的灵活控制。人工偏好评估表明,Kling-MotionControl在整体运动控制、开放域泛化及视觉质量连贯性方面均优于主流商业与开源方案,确立了其在高质量、可控且逼真的角色动画领域的领先地位。
English
Character animation aims to generate lifelike videos by transferring motion dynamics from a driving video to a reference image. Recent strides in generative models have paved the way for high-fidelity character animation. In this work, we present Kling-MotionControl, a unified DiT-based framework engineered specifically for robust, precise, and expressive holistic character animation. Leveraging a divide-and-conquer strategy within a cohesive system, the model orchestrates heterogeneous motion representations tailored to the distinct characteristics of body, face, and hands, effectively reconciling large-scale structural stability with fine-grained articulatory expressiveness. To ensure robust cross-identity generalization, we incorporate adaptive identity-agnostic learning, facilitating natural motion retargeting for diverse characters ranging from realistic humans to stylized cartoons. Simultaneously, we guarantee faithful appearance preservation through meticulous identity injection and fusion designs, further supported by a subject library mechanism that leverages comprehensive reference contexts. To ensure practical utility, we implement an advanced acceleration framework utilizing multi-stage distillation, boosting inference speed by over 10x. Kling-MotionControl distinguishes itself through intelligent semantic motion understanding and precise text responsiveness, allowing for flexible control beyond visual inputs. Human preference evaluations demonstrate that Kling-MotionControl delivers superior performance compared to leading commercial and open-source solutions, achieving exceptional fidelity in holistic motion control, open domain generalization, and visual quality and coherence. These results establish Kling-MotionControl as a robust solution for high-quality, controllable, and lifelike character animation.