ChatPaper.aiChatPaper

ISDrama:基於多模態提示的沉浸式空間戲劇生成

ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting

April 29, 2025
作者: Yu Zhang, Wenxiang Guo, Changhao Pan, Zhiyuan Zhu, Tao Jin, Zhou Zhao
cs.AI

摘要

多模態沉浸式空間戲劇生成致力於基於多模態提示,創造具有戲劇性韻律的連續多說話者雙耳語音,其在增強現實(AR)、虛擬現實(VR)等領域具有潛在應用價值。此任務需基於多模態輸入,同時建模空間信息與戲劇性韻律,數據收集成本高昂。據我們所知,我們的工作是首次嘗試應對這些挑戰。我們構建了MRSDrama,首個多模態錄製的空間戲劇數據集,包含雙耳戲劇音頻、劇本、視頻、幾何姿態及文本提示。隨後,我們提出了ISDrama,首個通過多模態提示實現的沉浸式空間戲劇生成模型。ISDrama主要由以下部分組成:1)基於對比學習的多模態姿態編碼器,考慮移動說話者引起的多普勒效應,從多模態提示中提取統一的姿態信息。2)沉浸式戲劇Transformer,一種基於流的Mamba-Transformer模型,生成高質量戲劇,融入Drama-MOE以選擇合適專家,增強韻律與姿態控制。我們還設計了一種上下文一致的無分類器指導策略,以連貫生成完整戲劇。實驗結果顯示,ISDrama在客觀與主觀指標上均優於基準模型。演示與數據集可訪問https://aaronz345.github.io/ISDramaDemo。
English
Multimodal immersive spatial drama generation focuses on creating continuous multi-speaker binaural speech with dramatic prosody based on multimodal prompts, with potential applications in AR, VR, and others. This task requires simultaneous modeling of spatial information and dramatic prosody based on multimodal inputs, with high data collection costs. To the best of our knowledge, our work is the first attempt to address these challenges. We construct MRSDrama, the first multimodal recorded spatial drama dataset, containing binaural drama audios, scripts, videos, geometric poses, and textual prompts. Then, we propose ISDrama, the first immersive spatial drama generation model through multimodal prompting. ISDrama comprises these primary components: 1) Multimodal Pose Encoder, based on contrastive learning, considering the Doppler effect caused by moving speakers to extract unified pose information from multimodal prompts. 2) Immersive Drama Transformer, a flow-based mamba-transformer model that generates high-quality drama, incorporating Drama-MOE to select proper experts for enhanced prosody and pose control. We also design a context-consistent classifier-free guidance strategy to coherently generate complete drama. Experimental results show that ISDrama outperforms baseline models on objective and subjective metrics. The demos and dataset are available at https://aaronz345.github.io/ISDramaDemo.

Summary

AI-Generated Summary

PDF71April 30, 2025