ChatPaper.aiChatPaper

条件化激活传输用于T2I安全引导 This translation maintains the technical accuracy of the original title while adapting it to the conventions of Chinese academic writing. The key elements are preserved: - "Conditioned Activation Transport" becomes "条件化激活传输" (conditioned activation transmission) - "T2I Safety Steering" is rendered as "T2I安全引导" (T2I safety guidance) The structure follows Chinese academic title patterns where the method comes before the application domain.

Conditioned Activation Transport for T2I Safety Steering

March 3, 2026
作者: Maciej Chrabąszcz, Aleksander Szymczyk, Jan Dubiński, Tomasz Trzciński, Franziska Boenisch, Adam Dziedzic
cs.AI

摘要

尽管当前文本到图像生成模型具备卓越能力,但仍易产生不安全及有害内容。虽然激活导向技术为推理阶段干预提供了可行方案,但我们发现线性激活导向在应用于良性提示词时往往会降低图像质量。为解决这一权衡问题,我们首先构建了SafeSteerDataset——一个包含2300组高余弦相似度的安全/不安全提示词对的对比数据集。基于此数据,我们提出条件激活迁移框架,该框架采用基于几何学的条件机制与非线性迁移映射。通过将迁移映射限定在有害激活区域内生效,我们最大限度地减少对良性查询的干扰。我们在Z-Image和Infinity两种前沿架构上验证了该方法。实验表明,CAT能有效适配不同骨干网络,在保持未导向生成图像保真度的同时显著降低攻击成功率。警告:本文包含可能引发不适的文本与图像内容。
English
Despite their impressive capabilities, current Text-to-Image (T2I) models remain prone to generating unsafe and toxic content. While activation steering offers a promising inference-time intervention, we observe that linear activation steering frequently degrades image quality when applied to benign prompts. To address this trade-off, we first construct SafeSteerDataset, a contrastive dataset containing 2300 safe and unsafe prompt pairs with high cosine similarity. Leveraging this data, we propose Conditioned Activation Transport (CAT), a framework that employs a geometry-based conditioning mechanism and nonlinear transport maps. By conditioning transport maps to activate only within unsafe activation regions, we minimize interference with benign queries. We validate our approach on two state-of-the-art architectures: Z-Image and Infinity. Experiments demonstrate that CAT generalizes effectively across these backbones, significantly reducing Attack Success Rate while maintaining image fidelity compared to unsteered generations. Warning: This paper contains potentially offensive text and images.
PDF12May 8, 2026