ChatPaper.aiChatPaper

ILIAS:大規模實例級圖像檢索

ILIAS: Instance-Level Image retrieval At Scale

February 17, 2025
作者: Giorgos Kordopatis-Zilos, Vladan Stojnić, Anna Manko, Pavel Šuma, Nikolaos-Antonios Ypsilantis, Nikos Efthymiadis, Zakaria Laskar, Jiří Matas, Ondřej Chum, Giorgos Tolias
cs.AI

摘要

本研究介紹了ILIAS,一個用於大規模實例級圖像檢索的新測試數據集。該數據集旨在評估當前及未來基礎模型與檢索技術在識別特定物體方面的能力。相比現有數據集,ILIAS的主要優勢包括大規模、領域多樣性、精確的真實標註,以及遠未飽和的性能表現。ILIAS包含1,000個物體實例的查詢圖像和正樣本圖像,這些圖像均為手動收集,以捕捉具有挑戰性的條件和多樣化的領域。大規模檢索是針對來自YFCC100M的1億張干擾圖像進行的。為了避免假陰性而無需額外標註工作,我們僅包含確認在2014年(即YFCC100M的編譯日期)之後出現的查詢物體。我們進行了廣泛的基準測試,並得出以下觀察:i) 在特定領域(如地標或產品)上微調的模型在該領域表現出色,但在ILIAS上表現不佳;ii) 使用多領域類別監督學習線性適應層能帶來性能提升,尤其是對視覺-語言模型;iii) 在檢索重排序中,局部描述符仍然是關鍵要素,特別是在存在嚴重背景雜亂的情況下;iv) 視覺-語言基礎模型的文本到圖像性能與相應的圖像到圖像案例驚人地接近。網站:https://vrg.fel.cvut.cz/ilias/
English
This work introduces ILIAS, a new test dataset for Instance-Level Image retrieval At Scale. It is designed to evaluate the ability of current and future foundation models and retrieval techniques to recognize particular objects. The key benefits over existing datasets include large scale, domain diversity, accurate ground truth, and a performance that is far from saturated. ILIAS includes query and positive images for 1,000 object instances, manually collected to capture challenging conditions and diverse domains. Large-scale retrieval is conducted against 100 million distractor images from YFCC100M. To avoid false negatives without extra annotation effort, we include only query objects confirmed to have emerged after 2014, i.e. the compilation date of YFCC100M. An extensive benchmarking is performed with the following observations: i) models fine-tuned on specific domains, such as landmarks or products, excel in that domain but fail on ILIAS ii) learning a linear adaptation layer using multi-domain class supervision results in performance improvements, especially for vision-language models iii) local descriptors in retrieval re-ranking are still a key ingredient, especially in the presence of severe background clutter iv) the text-to-image performance of the vision-language foundation models is surprisingly close to the corresponding image-to-image case. website: https://vrg.fel.cvut.cz/ilias/

Summary

AI-Generated Summary

PDF42February 18, 2025