ChatPaper.aiChatPaper

MedSteer:基于免训练激活导向的虚拟内窥镜合成技术

MedSteer: Counterfactual Endoscopic Synthesis via Training-Free Activation Steering

March 7, 2026
作者: Trong-Thang Pham, Loc Nguyen, Anh Nguyen, Hien Nguyen, Ngan Le
cs.AI

摘要

生成式扩散模型在医学影像数据增强中的应用日益增多,但基于文本提示的方法无法生成具有因果关系的训练数据。重新提示会重置整个生成轨迹,导致解剖结构、纹理和背景同时改变。基于反转的编辑方法存在重建误差,会引发结构漂移问题。我们提出MedSteer——一种用于内窥镜影像合成的免训练激活导向框架。该方法通过扩散变换器的交叉注意力层,为每个对比提示对识别病理向量。在推理阶段,它沿着该向量引导图像激活,从零生成反事实图像对,其中唯一差异仅存在于被引导的病理特征,其他所有结构均通过构建过程得以保留。我们在Kvasir v3和HyperKvasir数据集上进行了三项实验验证:针对三组临床概念的反事实生成任务,MedSteer的概念翻转率分别达到0.800、0.925和0.950,在概念翻转率与结构保真度上均优于最佳反转基线方法;在染色特征解耦任务中,MedSteer实现75%的染色去除率,显著优于PnP(20%)和h-Edit(10%);在下游息肉检测任务中,使用MedSteer反事实对进行数据增强的ViT模型AUC达0.9755,而数量匹配的重新提示方法仅为0.9083,证实反事实结构保留是性能提升的关键。代码详见https://github.com/phamtrongthang123/medsteer。
English
Generative diffusion models are increasingly used for medical imaging data augmentation, but text prompting cannot produce causal training data. Re-prompting rerolls the entire generation trajectory, altering anatomy, texture, and background. Inversion-based editing methods introduce reconstruction error that causes structural drift. We propose MedSteer, a training-free activation-steering framework for endoscopic synthesis. MedSteer identifies a pathology vector for each contrastive prompt pair in the cross-attention layers of a diffusion transformer. At inference time, it steers image activations along this vector, generating counterfactual pairs from scratch where the only difference is the steered concept. All other structure is preserved by construction. We evaluate MedSteer across three experiments on Kvasir v3 and HyperKvasir. On counterfactual generation across three clinical concept pairs, MedSteer achieves flip rates of 0.800, 0.925, and 0.950, outperforming the best inversion-based baseline in both concept flip rate and structural preservation. On dye disentanglement, MedSteer achieves 75% dye removal against 20% (PnP) and 10% (h-Edit). On downstream polyp detection, augmenting with MedSteer counterfactual pairs achieves ViT AUC of 0.9755 versus 0.9083 for quantity-matched re-prompting, confirming that counterfactual structure drives the gain. Code is at link https://github.com/phamtrongthang123/medsteer
PDF13March 16, 2026