MedSteer: Sintesi Endoscopica Controfattuale tramite Attivazione Guidata Senza Addestramento

Abstract

I modelli di diffusione generativa sono sempre più utilizzati per l'aumento dei dati di imaging medico, ma il prompting testuale non può produrre dati di addestramento causali. Il re-prompting riavvia l'intera traiettoria di generazione, alterando anatomia, texture e sfondo. I metodi di editing basati sull'inversione introducono un errore di ricostruzione che causa uno scostamento strutturale. Proponiamo MedSteer, un framework di attivazione guidata senza addestramento per la sintesi endoscopica. MedSteer identifica un vettore di patologia per ogni coppia di prompt contrastivi negli strati di cross-attention di un diffusion transformer. Al momento dell'inferenza, guida le attivazioni dell'immagine lungo questo vettore, generando coppie controfattuali da zero in cui l'unica differenza è il concetto guidato. Tutta la restante struttura è preservata per costruzione. Valutiamo MedSteer attraverso tre esperimenti su Kvasir v3 e HyperKvasir. Sulla generazione controfattuale attraverso tre coppie di concetti clinici, MedSteer raggiunge tassi di inversione (flip rate) di 0,800, 0,925 e 0,950, superando la migliore baseline basata su inversione sia nel tasso di inversione del concetto che nella preservazione strutturale. Sulla separazione del colorante (dye disentanglement), MedSteer ottiene una rimozione del 75% del colorante contro il 20% (PnP) e il 10% (h-Edit). Sul rilevamento a valle dei polipi, l'aumento con le coppie controfattuali di MedSteer raggiunge un AUC ViT di 0,9755 rispetto a 0,9083 per il re-prompting con quantità equivalente, confermando che la struttura controfattuale guadagna il miglioramento. Il codice è disponibile al link https://github.com/phamtrongthang123/medsteer

English

Generative diffusion models are increasingly used for medical imaging data augmentation, but text prompting cannot produce causal training data. Re-prompting rerolls the entire generation trajectory, altering anatomy, texture, and background. Inversion-based editing methods introduce reconstruction error that causes structural drift. We propose MedSteer, a training-free activation-steering framework for endoscopic synthesis. MedSteer identifies a pathology vector for each contrastive prompt pair in the cross-attention layers of a diffusion transformer. At inference time, it steers image activations along this vector, generating counterfactual pairs from scratch where the only difference is the steered concept. All other structure is preserved by construction. We evaluate MedSteer across three experiments on Kvasir v3 and HyperKvasir. On counterfactual generation across three clinical concept pairs, MedSteer achieves flip rates of 0.800, 0.925, and 0.950, outperforming the best inversion-based baseline in both concept flip rate and structural preservation. On dye disentanglement, MedSteer achieves 75% dye removal against 20% (PnP) and 10% (h-Edit). On downstream polyp detection, augmenting with MedSteer counterfactual pairs achieves ViT AUC of 0.9755 versus 0.9083 for quantity-matched re-prompting, confirming that counterfactual structure drives the gain. Code is at link https://github.com/phamtrongthang123/medsteer

MedSteer: Sintesi Endoscopica Controfattuale tramite Attivazione Guidata Senza Addestramento

MedSteer: Counterfactual Endoscopic Synthesis via Training-Free Activation Steering

Abstract

Support