MedVisionLlama：利用预训练的大型语言模型层来增强医学图像分割

摘要

大型语言模型（LLMs）以其在文本数据中的多功能性而闻名，越来越多地被探索其潜力，以增强医学图像分割，这是准确诊断成像的关键任务。本研究通过整合预训练的LLM变换器块，探讨了增强用于医学图像分割的Vision Transformers（ViTs）。我们的方法将一个冻结的LLM变换器块整合到基于ViT的模型的编码器中，导致在各种医学成像模态下分割性能显著提高。我们提出了一种混合注意力机制，结合全局和局部特征学习，以及一个多尺度融合块，用于跨不同尺度聚合特征。增强模型显示出显著的性能提升，包括平均Dice分数从0.74提高到0.79，以及准确性、精确度和Jaccard指数的改善。这些结果展示了基于LLM的变换器在优化医学图像分割方面的有效性，突显了它们显著提升模型准确性和鲁棒性的潜力。源代码和我们的实现可在以下链接找到：https://bit.ly/3zf2CVs

English

Large Language Models (LLMs), known for their versatility in textual data, are increasingly being explored for their potential to enhance medical image segmentation, a crucial task for accurate diagnostic imaging. This study explores enhancing Vision Transformers (ViTs) for medical image segmentation by integrating pre-trained LLM transformer blocks. Our approach, which incorporates a frozen LLM transformer block into the encoder of a ViT-based model, leads to substantial improvements in segmentation performance across various medical imaging modalities. We propose a Hybrid Attention Mechanism that combines global and local feature learning with a Multi-Scale Fusion Block for aggregating features across different scales. The enhanced model shows significant performance gains, including an average Dice score increase from 0.74 to 0.79 and improvements in accuracy, precision, and the Jaccard Index. These results demonstrate the effectiveness of LLM-based transformers in refining medical image segmentation, highlighting their potential to significantly boost model accuracy and robustness. The source code and our implementation are available at: https://bit.ly/3zf2CVs

MedVisionLlama：利用预训练的大型语言模型层来增强医学图像分割

MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation

摘要

Support