ChatPaper.aiChatPaper

LumosX:關聯任意身份特徵與屬性實現個性化影片生成

LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation

March 20, 2026
作者: Jiazheng Xing, Fei Du, Hangjie Yuan, Pengwei Liu, Hongbin Xu, Hai Ci, Ruigang Niu, Weihua Chen, Fan Wang, Yong Liu
cs.AI

摘要

近期擴散模型的進展顯著提升了文字到視訊的生成能力,使個性化內容創作能夠對前景與背景元素進行細粒度控制。然而,跨主體的精準臉部屬性對齊仍具挑戰性,現有方法缺乏確保群組內一致性的顯式機制。解決這一難題需要顯式建模策略與臉部屬性感知數據資源的雙重突破。為此,我們提出LumosX框架,在數據與模型設計層面同步推進。在數據層面,通過定製化的採集流程協調獨立視訊中的描述文本與視覺線索,並藉助多模態大語言模型推斷並分配主體特定依賴關係。這些提取的關係先驗施加了更細粒度的結構,既增強了個性化視訊生成的表達控制力,也支持構建綜合性基準測試集。在建模層面,關係自注意力與關係交叉注意力機制將位置感知嵌入與優化的注意力動態交織融合,刻畫顯式的主體-屬性依賴關係,從而強化群組內凝聚力並放大不同主體集群間的區隔性。在我們構建的基準測試上的綜合評估表明,LumosX在細粒度、身份一致性及語義對齊的個性化多主體視訊生成任務中實現了最先進的性能。程式碼與模型已開源於:https://jiazheng-xing.github.io/lumosx-home/。
English
Recent advances in diffusion models have significantly improved text-to-video generation, enabling personalized content creation with fine-grained control over both foreground and background elements. However, precise face-attribute alignment across subjects remains challenging, as existing methods lack explicit mechanisms to ensure intra-group consistency. Addressing this gap requires both explicit modeling strategies and face-attribute-aware data resources. We therefore propose LumosX, a framework that advances both data and model design. On the data side, a tailored collection pipeline orchestrates captions and visual cues from independent videos, while multimodal large language models (MLLMs) infer and assign subject-specific dependencies. These extracted relational priors impose a finer-grained structure that amplifies the expressive control of personalized video generation and enables the construction of a comprehensive benchmark. On the modeling side, Relational Self-Attention and Relational Cross-Attention intertwine position-aware embeddings with refined attention dynamics to inscribe explicit subject-attribute dependencies, enforcing disciplined intra-group cohesion and amplifying the separation between distinct subject clusters. Comprehensive evaluations on our benchmark demonstrate that LumosX achieves state-of-the-art performance in fine-grained, identity-consistent, and semantically aligned personalized multi-subject video generation. Code and models are available at https://jiazheng-xing.github.io/lumosx-home/.
PDF211March 24, 2026