驯服生成式合成数据用于X光违禁品检测
Taming Generative Synthetic Data for X-ray Prohibited Item Detection
November 19, 2025
作者: Jialong Sun, Hongguang Zhu, Weizhe Liu, Yunda Sun, Renshuai Tao, Yunchao Wei
cs.AI
摘要
訓練違禁品檢測模型需要大量X射線安檢圖像,但此類圖像的採集與標註工作耗時費力。為解決數據不足問題,X射線安檢圖像合成方法通過圖像組合技術擴充數據集。然而現有方法主要採用兩階段流程:首階段需進行費時費力的前景提取,次階段再進行圖像合成。此流程會引入不可避免的額外人力成本且效率低下。本文提出基於文本到圖像生成技術的單階段X射線安檢圖像合成框架Xsyn,該框架通過兩項有效策略提升合成圖像的實用性:交叉注意力優化策略利用擴散模型的交叉注意力圖來優化邊界框標註;背景遮擋建模策略在潛空間顯式建模背景遮擋以增強成像複雜度。據我們所知,相較現有方法,Xsyn是首個無需額外人力成本即可實現高質量X射線安檢圖像合成的方案。實驗表明,本方法以1.2% mAP提升優於所有現有方法,且生成的合成圖像能有效提升各類X射線安檢數據集與檢測器的違禁品檢測性能。代碼已開源於https://github.com/pILLOW-1/Xsyn/。
English
Training prohibited item detection models requires a large amount of X-ray security images, but collecting and annotating these images is time-consuming and laborious. To address data insufficiency, X-ray security image synthesis methods composite images to scale up datasets. However, previous methods primarily follow a two-stage pipeline, where they implement labor-intensive foreground extraction in the first stage and then composite images in the second stage. Such a pipeline introduces inevitable extra labor cost and is not efficient. In this paper, we propose a one-stage X-ray security image synthesis pipeline (Xsyn) based on text-to-image generation, which incorporates two effective strategies to improve the usability of synthetic images. The Cross-Attention Refinement (CAR) strategy leverages the cross-attention map from the diffusion model to refine the bounding box annotation. The Background Occlusion Modeling (BOM) strategy explicitly models background occlusion in the latent space to enhance imaging complexity. To the best of our knowledge, compared with previous methods, Xsyn is the first to achieve high-quality X-ray security image synthesis without extra labor cost. Experiments demonstrate that our method outperforms all previous methods with 1.2% mAP improvement, and the synthetic images generated by our method are beneficial to improve prohibited item detection performance across various X-ray security datasets and detectors. Code is available at https://github.com/pILLOW-1/Xsyn/.