Safe-Sora: 그래픽 워터마킹을 통한 안전한 텍스트-투-비디오 생성

초록

생성형 비디오 모델의 폭발적인 성장은 AI 생성 콘텐츠의 신뢰할 수 있는 저작권 보존에 대한 수요를 증폭시켰습니다. 이미지 합성에서는 널리 사용되지만, 비디오 생성 분야에서의 보이지 않는 생성형 워터마킹은 여전히 크게 탐구되지 않고 있습니다. 이러한 격차를 해결하기 위해, 우리는 비디오 생성 과정에 그래픽 워터마크를 직접 삽입하는 첫 번째 프레임워크인 Safe-Sora를 제안합니다. 워터마킹 성능이 워터마크와 커버 콘텐츠 간의 시각적 유사성과 밀접하게 연관되어 있다는 관찰에 동기를 받아, 우리는 계층적 coarse-to-fine 적응형 매칭 메커니즘을 도입했습니다. 구체적으로, 워터마크 이미지를 패치로 나누고 각 패치를 시각적으로 가장 유사한 비디오 프레임에 할당한 후, 최적의 공간 영역에 원활하게 삽입하기 위해 더욱 지역화합니다. 비디오 프레임 간 워터마크 패치의 시공간적 융합을 가능하게 하기 위해, 우리는 새로운 시공간적 로컬 스캐닝 전략을 갖춘 3D 웨이블릿 변환 강화 Mamba 아키텍처를 개발하여 워터마크 삽입 및 검색 과정에서의 장거리 의존성을 효과적으로 모델링합니다. 우리가 아는 한, 이는 상태 공간 모델을 워터마킹에 적용한 첫 번째 시도로, 효율적이고 강력한 워터마크 보호를 위한 새로운 길을 열었습니다. 광범위한 실험을 통해 Safe-Sora가 비디오 품질, 워터마크 충실도 및 견고성 측면에서 최첨단 성능을 달성함을 입증했으며, 이는 우리의 제안에 크게 기인합니다. 우리는 출판 시 코드를 공개할 예정입니다.

English

The explosive growth of generative video models has amplified the demand for reliable copyright preservation of AI-generated content. Despite its popularity in image synthesis, invisible generative watermarking remains largely underexplored in video generation. To address this gap, we propose Safe-Sora, the first framework to embed graphical watermarks directly into the video generation process. Motivated by the observation that watermarking performance is closely tied to the visual similarity between the watermark and cover content, we introduce a hierarchical coarse-to-fine adaptive matching mechanism. Specifically, the watermark image is divided into patches, each assigned to the most visually similar video frame, and further localized to the optimal spatial region for seamless embedding. To enable spatiotemporal fusion of watermark patches across video frames, we develop a 3D wavelet transform-enhanced Mamba architecture with a novel spatiotemporal local scanning strategy, effectively modeling long-range dependencies during watermark embedding and retrieval. To the best of our knowledge, this is the first attempt to apply state space models to watermarking, opening new avenues for efficient and robust watermark protection. Extensive experiments demonstrate that Safe-Sora achieves state-of-the-art performance in terms of video quality, watermark fidelity, and robustness, which is largely attributed to our proposals. We will release our code upon publication.

Safe-Sora: 그래픽 워터마킹을 통한 안전한 텍스트-투-비디오 생성

Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking

초록

Support