EffectMaker: 맞춤형 시각 효과 제작을 위한 추론과 생성의 통합

초록

시각 효과(VFX)는 영상 콘텐츠의 표현력과 창의성을 높이는 데 필수적이지만, 고품질 효과를 제작하려면 일반적으로 전문적인 지식과 고비용의 제작 파이프라인이 필요합니다. 기존 AIGC 시스템은 효과 특화 데이터의 부족과 초자연적이거나 스타일화된 효과를 모델링하는固有的인 난이도로 인해 VFX 생성에서 상당한 어려움에 직면해 있습니다. 더욱이 이러한 접근법들은 종종 효과별 미세 조정을 필요로 하여 새로운 VFX에 대한 확장성과 일반화를 심각하게 제한합니다. 본 연구에서는 참조 기반 VFX 사용자 지정을 가능하게 하는 통합 추론-생성 프레임워크인 EffectMaker를 제안합니다. EffectMaker는 멀티모달 대규모 언어 모델을 활용하여 높은 수준의 효과 의미를 해석하고 대상 주체에 효과가 어떻게 적용되어야 하는지 추론하는 한편, 디퓨전 트랜스포머는 컨텍스트 내 학습을 활용하여 참조 영상으로부터 세밀한 시각적 단서를 포착합니다. 이 두 구성 요소는 의미-시각 이중 경로 안내 메커니즘을 형성하여 효과별 미세 조정 없이도 정확하고 제어 가능하며 효과 일관성 있는 합성을 가능하게 합니다. 또한 일반화와 확장성을 향상시키기 위해 3,000개의 VFX 범주에 걸쳐 13만 개의 영상을 포함하는 가장 큰 규모의 고품질 합성 데이터셋인 EffectData를 구축했습니다. 실험 결과, EffectMaker는 최첨단 기준선들을 능가하는 우수한 시각적 품질과 효과 일관성을 달성하여 사용자 지정 VFX 생성을 위한 확장 가능하고 유연한 패러다임을 제공함을 보여줍니다. 프로젝트 페이지: https://effectmaker.github.io

English

Visual effects (VFX) are essential for enhancing the expressiveness and creativity of video content, yet producing high-quality effects typically requires expert knowledge and costly production pipelines. Existing AIGC systems face significant challenges in VFX generation due to the scarcity of effect-specific data and the inherent difficulty of modeling supernatural or stylized effects. Moreover, these approaches often require per-effect fine-tuning, which severely limits their scalability and generalization to novel VFX. In this work, we present EffectMaker, a unified reasoning-generation framework that enables reference-based VFX customization. EffectMaker employs a multimodal large language model to interpret high-level effect semantics and reason about how they should adapt to a target subject, while a diffusion transformer leverages in-context learning to capture fine-grained visual cues from reference videos. These two components form a semantic-visual dual-path guidance mechanism that enables accurate, controllable, and effect-consistent synthesis without per-effect fine-tuning. Furthermore, we construct EffectData, the largest high-quality synthetic dataset containing 130k videos across 3k VFX categories, to improve generalization and scalability. Experiments show that EffectMaker achieves superior visual quality and effect consistency over state-of-the-art baselines, offering a scalable and flexible paradigm for customized VFX generation. Project page: https://effectmaker.github.io

EffectMaker: 맞춤형 시각 효과 제작을 위한 추론과 생성의 통합

EffectMaker: Unifying Reasoning and Generation for Customized Visual Effect Creation

초록

Support