FlashPortrait:通过自适应潜在预测实现6倍速无限肖像动画
FlashPortrait: 6x Faster Infinite Portrait Animation with Adaptive Latent Prediction
December 18, 2025
作者: Shuyuan Tu, Yueming Pan, Yinming Huang, Xintong Han, Zhen Xing, Qi Dai, Kai Qiu, Chong Luo, Zuxuan Wu
cs.AI
摘要
当前基于扩散模型的长篇幅人像动画加速方法难以确保身份特征的稳定性。本文提出FlashPortrait——一种能合成保持身份特征、无限长度视频的端到端视频扩散变换器,其推理速度最高可提升6倍。该技术首先通过现成特征提取器计算身份无关的面部表情特征,继而引入标准化面部表情模块,通过各自均值方差对特征进行归一化处理,使面部特征与扩散潜空间对齐,从而提升面部建模的身份稳定性。在推理阶段,采用动态滑动窗口机制配合重叠区域加权融合策略,确保长动画的平滑过渡与身份一致性。在每个上下文窗口中,基于特定时间步的潜变量变化率及扩散层间导数幅值比,利用当前时间步的高阶潜导数直接预测未来时间步的潜变量,从而跳过多个去噪步骤实现6倍加速。基准测试表明,FlashPortrait在定性与定量评估中均展现出卓越性能。
English
Current diffusion-based acceleration methods for long-portrait animation struggle to ensure identity (ID) consistency. This paper presents FlashPortrait, an end-to-end video diffusion transformer capable of synthesizing ID-preserving, infinite-length videos while achieving up to 6x acceleration in inference speed. In particular, FlashPortrait begins by computing the identity-agnostic facial expression features with an off-the-shelf extractor. It then introduces a Normalized Facial Expression Block to align facial features with diffusion latents by normalizing them with their respective means and variances, thereby improving identity stability in facial modeling. During inference, FlashPortrait adopts a dynamic sliding-window scheme with weighted blending in overlapping areas, ensuring smooth transitions and ID consistency in long animations. In each context window, based on the latent variation rate at particular timesteps and the derivative magnitude ratio among diffusion layers, FlashPortrait utilizes higher-order latent derivatives at the current timestep to directly predict latents at future timesteps, thereby skipping several denoising steps and achieving 6x speed acceleration. Experiments on benchmarks show the effectiveness of FlashPortrait both qualitatively and quantitatively.