驾驭生成式合成数据提升X光违禁品检测能力
Taming Generative Synthetic Data for X-ray Prohibited Item Detection
November 19, 2025
作者: Jialong Sun, Hongguang Zhu, Weizhe Liu, Yunda Sun, Renshuai Tao, Yunchao Wei
cs.AI
摘要
训练违禁品检测模型需要大量X射线安检图像,但采集和标注这些图像耗时费力。为解决数据不足问题,X射线安检图像合成方法通过图像组合来扩增数据集。然而现有方法主要采用两阶段流程:第一阶段需进行费时费力的前景提取,第二阶段再进行图像合成。这种流程会引入不可避免的额外人力成本,效率较低。本文提出基于文本到图像生成的单阶段X射线安检图像合成框架Xsyn,通过两种有效策略提升合成图像的可用性。交叉注意力优化策略利用扩散模型的交叉注意力图优化边界框标注;背景遮挡建模策略在隐空间显式建模背景遮挡以增强成像复杂度。据我们所知,相较于现有方法,Xsyn首次实现了无需额外人力成本的高质量X射线安检图像合成。实验表明,本方法以1.2%的mAP提升优于所有现有方法,且生成的合成图像有助于提升多种X射线安检数据集和检测器的违禁品检测性能。代码已开源:https://github.com/pILLOW-1/Xsyn/。
English
Training prohibited item detection models requires a large amount of X-ray security images, but collecting and annotating these images is time-consuming and laborious. To address data insufficiency, X-ray security image synthesis methods composite images to scale up datasets. However, previous methods primarily follow a two-stage pipeline, where they implement labor-intensive foreground extraction in the first stage and then composite images in the second stage. Such a pipeline introduces inevitable extra labor cost and is not efficient. In this paper, we propose a one-stage X-ray security image synthesis pipeline (Xsyn) based on text-to-image generation, which incorporates two effective strategies to improve the usability of synthetic images. The Cross-Attention Refinement (CAR) strategy leverages the cross-attention map from the diffusion model to refine the bounding box annotation. The Background Occlusion Modeling (BOM) strategy explicitly models background occlusion in the latent space to enhance imaging complexity. To the best of our knowledge, compared with previous methods, Xsyn is the first to achieve high-quality X-ray security image synthesis without extra labor cost. Experiments demonstrate that our method outperforms all previous methods with 1.2% mAP improvement, and the synthetic images generated by our method are beneficial to improve prohibited item detection performance across various X-ray security datasets and detectors. Code is available at https://github.com/pILLOW-1/Xsyn/.