ChatPaper.aiChatPaper

AnyTalker:通过交互式优化实现多人对话视频生成的规模化扩展

AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement

November 28, 2025
作者: Zhizhou Zhong, Yicheng Ji, Zhe Kong, Yiying Liu, Jiarui Wang, Jiasun Feng, Lupeng Liu, Xiangyi Wang, Yanjia Li, Yuqing She, Ying Qin, Huan Li, Shuiyang Mao, Wei Liu, Wenhan Luo
cs.AI

摘要

近期,多人视频生成技术开始崭露头角。尽管已有初步研究探索了音频驱动的多人对话视频生成,但由于多样化多人数据采集成本高昂,以及实现多身份连贯交互存在困难,这些方法往往面临挑战。为应对这些难题,我们提出了AnyTalker——一个具有可扩展多流处理架构的多人生成框架。具体而言,我们通过创新的身份感知注意力机制扩展了扩散变换器的注意力模块,该机制能迭代处理身份-音频配对,实现可驱动身份数量的任意扩展。此外,训练多人生成模型需要海量多人数据,而我们提出的训练流程仅需单人口播视频即可学习多人对话模式,并仅用少量真实多人片段即可优化交互表现。我们还构建了专项评估指标与数据集,用于衡量生成视频的自然度与交互性。大量实验表明,AnyTalker在唇音同步、视觉质量和自然交互方面表现卓越,在数据成本与身份扩展性之间实现了良好平衡。
English
Recently, multi-person video generation has started to gain prominence. While a few preliminary works have explored audio-driven multi-person talking video generation, they often face challenges due to the high costs of diverse multi-person data collection and the difficulty of driving multiple identities with coherent interactivity. To address these challenges, we propose AnyTalker, a multi-person generation framework that features an extensible multi-stream processing architecture. Specifically, we extend Diffusion Transformer's attention block with a novel identity-aware attention mechanism that iteratively processes identity-audio pairs, allowing arbitrary scaling of drivable identities. Besides, training multi-person generative models demands massive multi-person data. Our proposed training pipeline depends solely on single-person videos to learn multi-person speaking patterns and refines interactivity with only a few real multi-person clips. Furthermore, we contribute a targeted metric and dataset designed to evaluate the naturalness and interactivity of the generated multi-person videos. Extensive experiments demonstrate that AnyTalker achieves remarkable lip synchronization, visual quality, and natural interactivity, striking a favorable balance between data costs and identity scalability.
PDF323December 2, 2025