ChatPaper.aiChatPaper

基於自動生成大規模資料集的指令引導胸部X光病灶分割 (注:此標題採用學術論文常見的翻譯規範,將"Instruction-Guided"譯為「指令引導」以保持技術術語一致性,"Lesion Segmentation"採用醫學影像領域標準譯法「病灶分割」,"Automatically Generated Large-Scale Dataset"完整譯為「自動生成大規模資料集」以準確傳達研究核心貢獻。)

Instruction-Guided Lesion Segmentation for Chest X-rays with Automatically Generated Large-Scale Dataset

November 19, 2025
作者: Geon Choi, Hangyul Yoon, Hyunju Shin, Hyunki Park, Sang Hoon Seo, Eunho Yang, Edward Choi
cs.AI

摘要

當前胸腔X光影像病灶分割模型的適用性受限於目標標籤數量稀少及依賴冗長專業的文本輸入,這為實際應用帶來了障礙。為解決這些限制,我們提出新範式:指令引導病灶分割,旨在透過簡單易用的指令實現多樣化病灶類型的分割。在此範式下,我們利用全自動多模態流程,從胸腔X光影像及其對應報告生成標註,建構首個大規模CXR病灶分割指令-答案資料集MIMIC-ILS。該資料集包含源自19.2萬張影像與9.1萬個獨特分割遮罩的110萬組指令-答案對,涵蓋七種主要病灶類型。為實證其效用,我們提出基於MIMIC-ILS微調的視覺-語言模型ROSALIA,該模型能根據使用者指令實現多病灶分割並提供文字解釋。在我們新提出的任務中,該模型展現出卓越的分割精度與文本生成準確性,彰顯了本流程的有效性,並確立MIMIC-ILS作為像素級CXR病灶定位基礎資源的價值。
English
The applicability of current lesion segmentation models for chest X-rays (CXRs) has been limited both by a small number of target labels and the reliance on long, detailed expert-level text inputs, creating a barrier to practical use. To address these limitations, we introduce a new paradigm: instruction-guided lesion segmentation (ILS), which is designed to segment diverse lesion types based on simple, user-friendly instructions. Under this paradigm, we construct MIMIC-ILS, the first large-scale instruction-answer dataset for CXR lesion segmentation, using our fully automated multimodal pipeline that generates annotations from chest X-ray images and their corresponding reports. MIMIC-ILS contains 1.1M instruction-answer pairs derived from 192K images and 91K unique segmentation masks, covering seven major lesion types. To empirically demonstrate its utility, we introduce ROSALIA, a vision-language model fine-tuned on MIMIC-ILS. ROSALIA can segment diverse lesions and provide textual explanations in response to user instructions. The model achieves high segmentation and textual accuracy in our newly proposed task, highlighting the effectiveness of our pipeline and the value of MIMIC-ILS as a foundational resource for pixel-level CXR lesion grounding.
PDF251December 2, 2025