RetFiner:視網膜基礎模型的視覺-語言精煉方案
RetFiner: A Vision-Language Refinement Scheme for Retinal Foundation Models
June 27, 2025
作者: Ronald Fecso, José Morano, Ursula Schmidt-Erfurth, Hrvoje Bogunović
cs.AI
摘要
光學相干斷層掃描(OCT)等成像技術的崛起,以及深度學習(DL)的進步,使得臨床醫生和研究人員能夠更高效地進行視網膜疾病分期。自監督學習(SSL)作為一種流行的深度學習方法,使模型能夠從大量未標記數據中學習,避免了昂貴的標註成本。SSL促進了基礎模型(FMs)的發展,這些大型模型可應用於多種下游任務。然而,現有的OCT基礎模型僅基於圖像數據訓練,缺乏對圖像全面且穩健的語義理解,這從其在下游任務(尤其是複雜任務)中的表現可見一斑,因此需要進行監督微調(這可能不可行)以更好地適應特定應用和人群。為解決這一問題,我們提出了RetFiner,這是一種自監督學習的視覺-語言精煉方案,旨在改進現有基礎模型的表示,並使其能夠高效且直接地適應特定人群,從而提升下游任務的表現。我們的方法利用了文本數據中豐富的監督信號,設計了多樣化的訓練目標。我們在視網膜基礎模型RETFound、UrFound和VisionFM上測試了RetFiner,結果顯示在七項高度多樣化的OCT分類任務中,線性探測性能均有顯著提升,平均分別比其基線提高了5.8、3.9和2.1個百分點。我們的代碼和模型權重已公開於https://github.com/ronnief1/RetFiner。
English
The rise of imaging techniques such as optical coherence tomography (OCT) and
advances in deep learning (DL) have enabled clinicians and researchers to
streamline retinal disease staging. A popular DL approach is self-supervised
learning (SSL), where models learn from vast amounts of unlabeled data,
avoiding costly annotation. SSL has allowed the development of foundation
models (FMs), large models that can be used for a variety of downstream tasks.
However, existing FMs for OCT, trained solely on image data, lack a
comprehensive and robust semantic understanding of images, as evidenced by
their downstream performance (especially for complex tasks), and thus require
supervised fine-tuning (which may be unfeasible) to better adapt to specific
applications and populations. To address this, we propose RetFiner, an SSL
vision-language refinement scheme that improves the representations of existing
FMs and enables their efficient and direct adaptation to specific populations
for improved downstream performance. Our method uses a diverse set of training
objectives which take advantage of the rich supervisory signal found in textual
data. We tested RetFiner on the retinal FMs RETFound, UrFound, and VisionFM,
showing significant improvements in linear probing performance on seven highly
diverse OCT classification tasks, with an average increase of 5.8, 3.9, and 2.1
percentage points over their baselines, respectively. Our code and model
weights are publicly available at https://github.com/ronnief1/RetFiner.