Set You Straight: 望ましくない概念を回避するための自動操舵型ノイズ除去軌道

要旨

テキストから画像を生成するモデルの倫理的な展開を確保するためには、有害または不適切なコンテンツの生成を防ぐ効果的な技術が必要です。概念消去法は有望な解決策を提供しますが、既存のファインチューニングベースのアプローチには顕著な限界があります。アンカーフリーの方法はサンプリング軌道を乱すリスクがあり、視覚的なアーティファクトを引き起こす可能性があります。一方、アンカーベースの方法はヒューリスティックなアンカー概念の選択に依存しています。これらの欠点を克服するため、我々はANTと呼ばれるファインチューニングフレームワークを導入します。ANTは、不要な概念を避けるためにデノイジング軌道を自動的に誘導します。ANTは、分類器不要ガイダンスの条件方向を中後期のデノイジング段階で逆転させるという重要な洞察に基づいて構築されています。これにより、初期段階の構造的整合性を犠牲にすることなく、正確なコンテンツ修正が可能になります。これにより、ヒューリスティックなアンカー概念選択に依存せず、自然画像多様体に向けてサンプルを導く初期段階のスコア関数場の整合性を保つ軌道認識目的関数が生まれます。単一概念消去のため、我々は拡張強化された重みサリエンシーマップを提案し、不要な概念に最も大きく寄与する重要なパラメータを正確に特定し、より徹底的かつ効率的な消去を可能にします。複数概念消去のため、我々の目的関数は汎用性の高いプラグアンドプレイソリューションを提供し、パフォーマンスを大幅に向上させます。広範な実験により、ANTが単一および複数概念消去において最先端の結果を達成し、生成品質を損なうことなく高品質で安全な出力を提供することが実証されています。コードはhttps://github.com/lileyang1210/ANTで公開されています。

English

Ensuring the ethical deployment of text-to-image models requires effective techniques to prevent the generation of harmful or inappropriate content. While concept erasure methods offer a promising solution, existing finetuning-based approaches suffer from notable limitations. Anchor-free methods risk disrupting sampling trajectories, leading to visual artifacts, while anchor-based methods rely on the heuristic selection of anchor concepts. To overcome these shortcomings, we introduce a finetuning framework, dubbed ANT, which Automatically guides deNoising Trajectories to avoid unwanted concepts. ANT is built on a key insight: reversing the condition direction of classifier-free guidance during mid-to-late denoising stages enables precise content modification without sacrificing early-stage structural integrity. This inspires a trajectory-aware objective that preserves the integrity of the early-stage score function field, which steers samples toward the natural image manifold, without relying on heuristic anchor concept selection. For single-concept erasure, we propose an augmentation-enhanced weight saliency map to precisely identify the critical parameters that most significantly contribute to the unwanted concept, enabling more thorough and efficient erasure. For multi-concept erasure, our objective function offers a versatile plug-and-play solution that significantly boosts performance. Extensive experiments demonstrate that ANT achieves state-of-the-art results in both single and multi-concept erasure, delivering high-quality, safe outputs without compromising the generative fidelity. Code is available at https://github.com/lileyang1210/ANT

Set You Straight: 望ましくない概念を回避するための自動操舵型ノイズ除去軌道

Set You Straight: Auto-Steering Denoising Trajectories to Sidestep Unwanted Concepts

要旨

Support