RetFiner：面向视网膜基础模型的视觉语言精炼方案

摘要

光学相干断层扫描（OCT）等成像技术的兴起与深度学习（DL）的进步，使得临床医生和研究人员能够更高效地进行视网膜疾病分期。在深度学习中，自监督学习（SSL）作为一种流行方法，允许模型从大量未标注数据中学习，从而避免了昂贵的标注成本。SSL促进了基础模型（FMs）的发展，这些大型模型可适用于多种下游任务。然而，现有的OCT基础模型仅基于图像数据训练，缺乏对图像全面且稳健的语义理解，这在其下游任务表现（尤其是复杂任务）中尤为明显，因此需要监督微调（这可能不可行）以更好地适应特定应用和人群。为此，我们提出了RetFiner，一种SSL视觉语言精炼方案，旨在提升现有基础模型的表征能力，并使其能够高效直接地适应特定人群，从而提升下游任务表现。我们的方法利用文本数据中丰富的监督信号，设计了一系列多样化的训练目标。我们在视网膜基础模型RETFound、UrFound和VisionFM上测试了RetFiner，结果显示在七项高度多样化的OCT分类任务中，线性探测性能显著提升，相较于基线分别平均提高了5.8、3.9和2.1个百分点。我们的代码和模型权重已公开于https://github.com/ronnief1/RetFiner。

English

The rise of imaging techniques such as optical coherence tomography (OCT) and advances in deep learning (DL) have enabled clinicians and researchers to streamline retinal disease staging. A popular DL approach is self-supervised learning (SSL), where models learn from vast amounts of unlabeled data, avoiding costly annotation. SSL has allowed the development of foundation models (FMs), large models that can be used for a variety of downstream tasks. However, existing FMs for OCT, trained solely on image data, lack a comprehensive and robust semantic understanding of images, as evidenced by their downstream performance (especially for complex tasks), and thus require supervised fine-tuning (which may be unfeasible) to better adapt to specific applications and populations. To address this, we propose RetFiner, an SSL vision-language refinement scheme that improves the representations of existing FMs and enables their efficient and direct adaptation to specific populations for improved downstream performance. Our method uses a diverse set of training objectives which take advantage of the rich supervisory signal found in textual data. We tested RetFiner on the retinal FMs RETFound, UrFound, and VisionFM, showing significant improvements in linear probing performance on seven highly diverse OCT classification tasks, with an average increase of 5.8, 3.9, and 2.1 percentage points over their baselines, respectively. Our code and model weights are publicly available at https://github.com/ronnief1/RetFiner.

RetFiner：面向视网膜基础模型的视觉语言精炼方案

RetFiner: A Vision-Language Refinement Scheme for Retinal Foundation Models

摘要

Support