語彙バイアスを用いた自己回帰型画像透かし：再生攻撃に対する耐性を有するアプローチ

要旨

自己回帰型（AR）画像生成モデルは、合成品質における画期的な進展により注目を集めており、悪用を防ぐための堅牢な透かし技術の必要性が高まっている。しかし、既存の生成中透かし技術は主に拡散モデル向けに設計されており、透かしは拡散潜在状態に埋め込まれる。この設計は、トークン予測を通じて逐次的に画像を生成するARモデルへの直接的な適用に大きな課題を提起する。さらに、拡散ベースの再生成攻撃は、拡散潜在状態を撹乱することで、そのような透かしを効果的に消去することができる。これらの課題に対処するため、我々は再生成攻撃に耐性を持つARモデル向けの新しいフレームワークであるLexical Bias Watermarking（LBW）を提案する。LBWは、生成中にトークン選択を事前定義されたグリーンリストに偏らせることで、透かしを直接トークンマップに埋め込む。このアプローチにより、既存のARモデルとのシームレスな統合が可能となり、事後透かしにも自然に拡張される。ホワイトボックス攻撃に対するセキュリティを高めるため、単一のグリーンリストを使用する代わりに、各画像のグリーンリストはグリーンリストのプールからランダムにサンプリングされる。透かしの検出は、トークン分布の量子化と統計分析を通じて行われる。広範な実験により、LBWが特に再生成攻撃に対する耐性において優れた透かしの堅牢性を達成することが実証された。

English

Autoregressive (AR) image generation models have gained increasing attention for their breakthroughs in synthesis quality, highlighting the need for robust watermarking to prevent misuse. However, existing in-generation watermarking techniques are primarily designed for diffusion models, where watermarks are embedded within diffusion latent states. This design poses significant challenges for direct adaptation to AR models, which generate images sequentially through token prediction. Moreover, diffusion-based regeneration attacks can effectively erase such watermarks by perturbing diffusion latent states. To address these challenges, we propose Lexical Bias Watermarking (LBW), a novel framework designed for AR models that resists regeneration attacks. LBW embeds watermarks directly into token maps by biasing token selection toward a predefined green list during generation. This approach ensures seamless integration with existing AR models and extends naturally to post-hoc watermarking. To increase the security against white-box attacks, instead of using a single green list, the green list for each image is randomly sampled from a pool of green lists. Watermark detection is performed via quantization and statistical analysis of the token distribution. Extensive experiments demonstrate that LBW achieves superior watermark robustness, particularly in resisting regeneration attacks.

語彙バイアスを用いた自己回帰型画像透かし：再生攻撃に対する耐性を有するアプローチ

Autoregressive Images Watermarking through Lexical Biasing: An Approach Resistant to Regeneration Attack

要旨

Support