BrandFusion: 텍스트-비디오 생성에서 원활한 브랜드 통합을 위한 다중 에이전트 프레임워크

초록

텍스트-비디오(T2V) 모델의 급속한 발전은 콘텐츠 제작에 혁명을 가져왔으나, 그 상업적 잠재력은 여전히 크게 개방되지 않은 상태입니다. 본 연구에서는 처음으로 T2V에서의 원활한 브랜드 통합 과제를 소개합니다. 이는 사용자 의도에 대한 의미론적 충실도를 보존하면서 프롬프트로 생성된 비디오에 광고주 브랜드를 자동으로 삽입하는 것입니다. 이 과제는 프롬프트 충실도 유지, 브랜드 식별성 보장, 상황에 맞는 자연스러운 통합이라는 세 가지 핵심 과제에 직면합니다. 이를 해결하기 위해 우리는 상호 협력적인 두 단계로 구성된 새로운 다중 에이전트 프레임워크인 BrandFusion을 제안합니다. 오프라인 단계(광고주 대상)에서는 모델 사전 지식을 탐색하고 경량 파인튜닝을 통해 새로운 브랜드에 적응함으로써 브랜드 지식 베이스를 구축합니다. 온라인 단계(사용자 대상)에서는 다섯 개의 에이전트가 공유 지식 베이스와 실시간 상황 추적을 활용하여 브랜드 가시성과 의미론적 정렬을 보장하기 위해 반복적 개선을 통해 사용자 프롬프트를 공동으로 정제합니다. 여러 최첨단 T2V 모델을 대상으로 18개의 기존 브랜드와 2개의 맞춤형 브랜드에 대해 진행한 실험 결과, BrandFusion이 의미론적 보존, 브랜드 식별성, 통합 자연스러움 측면에서 기준선을 크게 능가함을 입증했습니다. 인간 평가를 통해서도 더 높은 사용자 만족도를 확인하여 지속 가능한 T2V 수익화를 위한 실용적인 경로를 확립했습니다.

English

The rapid advancement of text-to-video (T2V) models has revolutionized content creation, yet their commercial potential remains largely untapped. We introduce, for the first time, the task of seamless brand integration in T2V: automatically embedding advertiser brands into prompt-generated videos while preserving semantic fidelity to user intent. This task confronts three core challenges: maintaining prompt fidelity, ensuring brand recognizability, and achieving contextually natural integration. To address them, we propose BrandFusion, a novel multi-agent framework comprising two synergistic phases. In the offline phase (advertiser-facing), we construct a Brand Knowledge Base by probing model priors and adapting to novel brands via lightweight fine-tuning. In the online phase (user-facing), five agents jointly refine user prompts through iterative refinement, leveraging the shared knowledge base and real-time contextual tracking to ensure brand visibility and semantic alignment. Experiments on 18 established and 2 custom brands across multiple state-of-the-art T2V models demonstrate that BrandFusion significantly outperforms baselines in semantic preservation, brand recognizability, and integration naturalness. Human evaluations further confirm higher user satisfaction, establishing a practical pathway for sustainable T2V monetization.

BrandFusion: 텍스트-비디오 생성에서 원활한 브랜드 통합을 위한 다중 에이전트 프레임워크

BrandFusion: A Multi-Agent Framework for Seamless Brand Integration in Text-to-Video Generation

초록

Support