어휘 편향을 통한 자기회귀 이미지 워터마킹: 재생성 공격에 강인한 접근법

초록

자기회귀(AR) 이미지 생성 모델은 합성 품질에서의 획기적인 발전으로 인해 점점 더 많은 관심을 받고 있으며, 이로 인해 오용을 방지하기 위한 강력한 워터마킹 기술의 필요성이 부각되고 있다. 그러나 기존의 생성 중 워터마킹 기술은 주로 확산 모델을 위해 설계되었으며, 워터마크가 확산 잠재 상태 내에 삽입된다. 이러한 설계는 토큰 예측을 통해 순차적으로 이미지를 생성하는 AR 모델에 직접 적용하기에는 상당한 어려움을 야기한다. 또한, 확산 기반 재생성 공격은 확산 잠재 상태를 교란함으로써 이러한 워터마크를 효과적으로 제거할 수 있다. 이러한 문제를 해결하기 위해, 우리는 재생성 공격에 저항하는 AR 모델을 위한 새로운 프레임워크인 어휘 편향 워터마킹(Lexical Bias Watermarking, LBW)을 제안한다. LBW는 생성 과정에서 미리 정의된 그린 리스트(green list)를 향해 토큰 선택을 편향시킴으로써 워터마크를 토큰 맵에 직접 삽입한다. 이 접근 방식은 기존 AR 모델과의 원활한 통합을 보장하며, 사후 워터마킹으로 자연스럽게 확장된다. 화이트박스 공격에 대한 보안을 강화하기 위해, 단일 그린 리스트를 사용하는 대신 각 이미지에 대한 그린 리스트를 그린 리스트 풀에서 무작위로 샘플링한다. 워터마크 검출은 토큰 분포의 양자화 및 통계적 분석을 통해 수행된다. 광범위한 실험을 통해 LBW가 특히 재생성 공격에 저항하는 데 있어 우수한 워터마크 견고성을 달성함을 입증하였다.

English

Autoregressive (AR) image generation models have gained increasing attention for their breakthroughs in synthesis quality, highlighting the need for robust watermarking to prevent misuse. However, existing in-generation watermarking techniques are primarily designed for diffusion models, where watermarks are embedded within diffusion latent states. This design poses significant challenges for direct adaptation to AR models, which generate images sequentially through token prediction. Moreover, diffusion-based regeneration attacks can effectively erase such watermarks by perturbing diffusion latent states. To address these challenges, we propose Lexical Bias Watermarking (LBW), a novel framework designed for AR models that resists regeneration attacks. LBW embeds watermarks directly into token maps by biasing token selection toward a predefined green list during generation. This approach ensures seamless integration with existing AR models and extends naturally to post-hoc watermarking. To increase the security against white-box attacks, instead of using a single green list, the green list for each image is randomly sampled from a pool of green lists. Watermark detection is performed via quantization and statistical analysis of the token distribution. Extensive experiments demonstrate that LBW achieves superior watermark robustness, particularly in resisting regeneration attacks.

어휘 편향을 통한 자기회귀 이미지 워터마킹: 재생성 공격에 강인한 접근법

Autoregressive Images Watermarking through Lexical Biasing: An Approach Resistant to Regeneration Attack

초록

Support