CosmoCLIP:將大型視覺語言模型應用於天文影像泛化
CosmoCLIP: Generalizing Large Vision-Language Models for Astronomical Imaging
July 10, 2024
作者: Raza Imam, Mohammed Talha Alam, Umaima Rahman, Mohsen Guizani, Fakhri Karray
cs.AI
摘要
現有的視覺-文本對比學習模型通過匹配配對的圖像和標題嵌入,同時將不相關的配對分開,增強了表示可轉移性,支持零樣本預測。然而,與互聯網上可用的一般圖像和標籤數據集相比,天文圖像-標籤數據集顯著較小。我們引入了CosmoCLIP,一個精確在預先訓練的CLIP模型上進行微調的天文圖像-文本對比學習框架,使用SpaceNet和基於BLIP的標題。通過FLARE獲得的SpaceNet包含約13k個最佳分佈的圖像,而BLIP則充當豐富的知識提取器。從SpaceNet和BLIP描述中獲得的豐富語義,在對比學習時,使CosmoCLIP能夠在各種領域內和領域外任務中實現優越的泛化。我們的結果表明,CosmoCLIP是一個直觀但強大的框架,在零樣本分類和圖像-文本檢索任務中明顯優於CLIP。
English
Existing vision-text contrastive learning models enhance representation
transferability and support zero-shot prediction by matching paired image and
caption embeddings while pushing unrelated pairs apart. However, astronomical
image-label datasets are significantly smaller compared to general image and
label datasets available from the internet. We introduce CosmoCLIP, an
astronomical image-text contrastive learning framework precisely fine-tuned on
the pre-trained CLIP model using SpaceNet and BLIP-based captions. SpaceNet,
attained via FLARE, constitutes ~13k optimally distributed images, while BLIP
acts as a rich knowledge extractor. The rich semantics derived from this
SpaceNet and BLIP descriptions, when learned contrastively, enable CosmoCLIP to
achieve superior generalization across various in-domain and out-of-domain
tasks. Our results demonstrate that CosmoCLIP is a straightforward yet powerful
framework, significantly outperforming CLIP in zero-shot classification and
image-text retrieval tasks.Summary
AI-Generated Summary