ChatPaper.aiChatPaper

OmniForcing:实时联合视听生成技术全面释放

OmniForcing: Unleashing Real-time Joint Audio-Visual Generation

March 12, 2026
作者: Yaofeng Su, Yuming Li, Zeyue Xue, Jie Huang, Siming Fu, Haoran Li, Ying Li, Zezhong Qian, Haoyang Huang, Nan Duan
cs.AI

摘要

近期联合视听扩散模型虽实现了卓越的生成质量,但其双向注意力依赖导致的高延迟阻碍了实时应用。我们提出OmniForcing——首个将离线双流双向扩散模型蒸馏为高保真流式自回归生成器的框架。然而,对此类双流架构直接应用因果蒸馏会因模态间极端时间不对称性及由此产生的令牌稀疏性,引发严重的训练不稳定问题。我们通过引入具有零截断全局前缀的非对称块因果对齐机制,解决多模态同步漂移问题,从而弥合固有信息密度差距。针对因果转换过程中因音频令牌极度稀疏导致的梯度爆炸,我们进一步采用配备恒等RoPE约束的音频汇聚令牌机制予以解决。最终,通过联合自强制蒸馏范式,使模型能够在长序列推演中动态自校正由曝光偏差引起的累积跨模态误差。借助模态无关的滚动KV缓存推理方案,OmniForcing在单GPU上实现了模拟25帧/秒的顶尖流式生成性能,同时保持与双向教师模型相当的多模态同步性和视觉质量。项目页面:https://omniforcing.com
English
Recent joint audio-visual diffusion models achieve remarkable generation quality but suffer from high latency due to their bidirectional attention dependencies, hindering real-time applications. We propose OmniForcing, the first framework to distill an offline, dual-stream bidirectional diffusion model into a high-fidelity streaming autoregressive generator. However, naively applying causal distillation to such dual-stream architectures triggers severe training instability, due to the extreme temporal asymmetry between modalities and the resulting token sparsity. We address the inherent information density gap by introducing an Asymmetric Block-Causal Alignment with a zero-truncation Global Prefix that prevents multi-modal synchronization drift. The gradient explosion caused by extreme audio token sparsity during the causal shift is further resolved through an Audio Sink Token mechanism equipped with an Identity RoPE constraint. Finally, a Joint Self-Forcing Distillation paradigm enables the model to dynamically self-correct cumulative cross-modal errors from exposure bias during long rollouts. Empowered by a modality-independent rolling KV-cache inference scheme, OmniForcing achieves state-of-the-art streaming generation at sim25 FPS on a single GPU, maintaining multi-modal synchronization and visual quality on par with the bidirectional teacher.Project Page: https://omniforcing.com{https://omniforcing.com}
PDF314March 30, 2026