ChatPaper.aiChatPaper

品牌融合:一种面向文本到视频生成中无缝品牌整合的多智能体框架

BrandFusion: A Multi-Agent Framework for Seamless Brand Integration in Text-to-Video Generation

March 3, 2026
作者: Zihao Zhu, Ruotong Wang, Siwei Lyu, Min Zhang, Baoyuan Wu
cs.AI

摘要

文本到视频(T2V)模型的快速发展虽已彻底改变内容创作模式,但其商业潜力仍待充分挖掘。本文首次提出T2V无缝品牌植入任务:在保持用户意图语义保真度的前提下,将广告主品牌自动嵌入提示词生成的视频中。该任务面临三大核心挑战:保持提示词忠实度、确保品牌可识别性、实现情境自然融合。为此,我们提出创新性多智能体框架BrandFusion,其协同工作流程包含两个阶段。离线阶段(面向广告主)通过探测模型先验知识并采用轻量化微调适配新品牌,构建品牌知识库;在线阶段(面向用户)则由五个智能体基于共享知识库与实时情境追踪,通过迭代优化共同完善用户提示词,确保品牌可见度与语义一致性。在多个前沿T2V模型上对18个成熟品牌和2个定制品牌的实验表明,BrandFusion在语义保持、品牌识别度与融合自然度上显著超越基线方法。人工评估进一步证实其能提升用户满意度,为T2V技术的可持续商业化提供了可行路径。
English
The rapid advancement of text-to-video (T2V) models has revolutionized content creation, yet their commercial potential remains largely untapped. We introduce, for the first time, the task of seamless brand integration in T2V: automatically embedding advertiser brands into prompt-generated videos while preserving semantic fidelity to user intent. This task confronts three core challenges: maintaining prompt fidelity, ensuring brand recognizability, and achieving contextually natural integration. To address them, we propose BrandFusion, a novel multi-agent framework comprising two synergistic phases. In the offline phase (advertiser-facing), we construct a Brand Knowledge Base by probing model priors and adapting to novel brands via lightweight fine-tuning. In the online phase (user-facing), five agents jointly refine user prompts through iterative refinement, leveraging the shared knowledge base and real-time contextual tracking to ensure brand visibility and semantic alignment. Experiments on 18 established and 2 custom brands across multiple state-of-the-art T2V models demonstrate that BrandFusion significantly outperforms baselines in semantic preservation, brand recognizability, and integration naturalness. Human evaluations further confirm higher user satisfaction, establishing a practical pathway for sustainable T2V monetization.
PDF21March 12, 2026