ChatPaper.aiChatPaper

**Kling 运动控制技术报告**

Kling-MotionControl Technical Report

March 3, 2026
作者: Kling Team, Jialu Chen, Yikang Ding, Zhixue Fang, Kun Gai, Kang He, Xu He, Jingyun Hua, Mingming Lao, Xiaohan Li, Hui Liu, Jiwen Liu, Xiaoqiang Liu, Fan Shi, Xiaoyu Shi, Peiqin Sun, Songlin Tang, Pengfei Wan, Tiancheng Wen, Zhiyong Wu, Haoxian Zhang, Runze Zhao, Yuanxing Zhang, Yan Zhou
cs.AI

摘要

人物動畫旨在透過將驅動影片中的運動動態遷移至參考圖像,來生成栩栩如生的影片。近期生成模型的突破為高擬真度人物動畫開闢了道路。本文提出Kling-MotionControl——一個基於DiT的統一框架,專為實現強健、精確且富有表現力的整體人物動畫而設計。該模型在協同系統中採用分治策略,針對身體、面部和手部的不同特徵協調異構運動表徵,有效平衡大尺度結構穩定性與細粒度關節表現力。為確保跨身份泛化能力,我們引入自適應身份無關學習機制,實現從真實人類到風格化卡通等多元角色的自然運動重定向。同時透過精細的身份注入與融合設計,結合利用完整參考上下文的主題庫機制,確保外觀特徵的忠實還原。為提升實用性,我們採用基於多階段蒸餾的先進加速框架,將推理速度提升逾10倍。Kling-MotionControl憑藉智能語義運動理解與精準文本響應能力脫穎而出,可實現超越視覺輸入的靈活控制。人工偏好評估表明,Kling-MotionControl在整體運動控制、開放域泛化及視覺質量連貫性方面均優於主流商業與開源方案,展現出卓越的擬真度。這些成果確立了Kling-MotionControl作為高質量、可控且生動的人物動畫的強健解決方案。
English
Character animation aims to generate lifelike videos by transferring motion dynamics from a driving video to a reference image. Recent strides in generative models have paved the way for high-fidelity character animation. In this work, we present Kling-MotionControl, a unified DiT-based framework engineered specifically for robust, precise, and expressive holistic character animation. Leveraging a divide-and-conquer strategy within a cohesive system, the model orchestrates heterogeneous motion representations tailored to the distinct characteristics of body, face, and hands, effectively reconciling large-scale structural stability with fine-grained articulatory expressiveness. To ensure robust cross-identity generalization, we incorporate adaptive identity-agnostic learning, facilitating natural motion retargeting for diverse characters ranging from realistic humans to stylized cartoons. Simultaneously, we guarantee faithful appearance preservation through meticulous identity injection and fusion designs, further supported by a subject library mechanism that leverages comprehensive reference contexts. To ensure practical utility, we implement an advanced acceleration framework utilizing multi-stage distillation, boosting inference speed by over 10x. Kling-MotionControl distinguishes itself through intelligent semantic motion understanding and precise text responsiveness, allowing for flexible control beyond visual inputs. Human preference evaluations demonstrate that Kling-MotionControl delivers superior performance compared to leading commercial and open-source solutions, achieving exceptional fidelity in holistic motion control, open domain generalization, and visual quality and coherence. These results establish Kling-MotionControl as a robust solution for high-quality, controllable, and lifelike character animation.
PDF261May 8, 2026