PersonaLive!面向直播的富有表现力的人物肖像图像动画
PersonaLive! Expressive Portrait Image Animation for Live Streaming
December 12, 2025
作者: Zhiyuan Li, Chi-Man Pun, Chen Fang, Jue Wang, Xiaodong Cun
cs.AI
摘要
当前基于扩散模型的肖像动画技术主要聚焦于提升视觉质量与表情真实感,却普遍忽视了生成延迟与实时性能,这限制了其在直播场景中的应用范围。我们提出PersonaLive——一种基于扩散模型的新型框架,通过多阶段训练方案实现流式实时肖像动画。具体而言,我们首先采用混合隐式信号(即隐式面部表征与3D隐式关键点)来实现富有表现力的图像级运动控制。随后提出一种少步数外观蒸馏策略,通过消除去噪过程中的外观冗余大幅提升推理效率。最后,我们引入配备滑动训练策略和历史关键帧机制的自回归微片段流式生成范式,以实现低延迟且稳定的长时序视频生成。大量实验表明,PersonaLive在达到领先性能的同时,相比现有扩散式肖像动画模型实现了7-22倍的加速效果。
English
Current diffusion-based portrait animation models predominantly focus on enhancing visual quality and expression realism, while overlooking generation latency and real-time performance, which restricts their application range in the live streaming scenario. We propose PersonaLive, a novel diffusion-based framework towards streaming real-time portrait animation with multi-stage training recipes. Specifically, we first adopt hybrid implicit signals, namely implicit facial representations and 3D implicit keypoints, to achieve expressive image-level motion control. Then, a fewer-step appearance distillation strategy is proposed to eliminate appearance redundancy in the denoising process, greatly improving inference efficiency. Finally, we introduce an autoregressive micro-chunk streaming generation paradigm equipped with a sliding training strategy and a historical keyframe mechanism to enable low-latency and stable long-term video generation. Extensive experiments demonstrate that PersonaLive achieves state-of-the-art performance with up to 7-22x speedup over prior diffusion-based portrait animation models.