ChatPaper.aiChatPaper

一图胜千言:统一的零样本图像描述框架

One Patch to Caption Them All: A Unified Zero-Shot Captioning Framework

October 3, 2025
作者: Lorenzo Bianchi, Giacomo Pacini, Fabio Carrara, Nicola Messina, Giuseppe Amato, Fabrizio Falchi
cs.AI

摘要

零樣本圖像描述模型是近期提出的利用視覺-語言共同空間表徵來為圖像生成描述,而無需依賴配對圖像-文本數據的模型。這類模型通過對文本對齊的圖像特徵進行文本解碼來生成描述,但其應用範圍僅限於全局表徵和整圖描述。我們提出了一個統一的零樣本描述框架,該框架從以圖像為中心轉向以圖塊為中心的範式,使得無需區域級別監督即可對任意區域進行描述。我們不再依賴全局圖像表徵,而是將單個圖塊視為基本的描述單元,並將其聚合以描述從單個圖塊到非連續區域乃至整張圖像的任意區域。我們分析了使現有潛在描述模型能在我們新提出的框架中運作的關鍵要素。實驗表明,如DINO等能生成有意義且密集視覺特徵的骨幹網絡,對於在多種基於區域的描述任務中達到最先進性能至關重要。與其他基線模型和最新競爭對手相比,我們的模型在零樣本密集描述、區域集描述以及新引入的軌跡描述任務中均取得了更優異的性能,凸顯了基於圖塊語義表徵的可擴展描述生成的有效性。項目頁面請訪問:https://paciosoft.com/Patch-ioner/。
English
Zero-shot captioners are recently proposed models that utilize common-space vision-language representations to caption images without relying on paired image-text data. To caption an image, they proceed by textually decoding a text-aligned image feature, but they limit their scope to global representations and whole-image captions. We present , a unified framework for zero-shot captioning that shifts from an image-centric to a patch-centric paradigm, enabling the captioning of arbitrary regions without the need of region-level supervision. Instead of relying on global image representations, we treat individual patches as atomic captioning units and aggregate them to describe arbitrary regions, from single patches to non-contiguous areas and entire images. We analyze the key ingredients that enable current latent captioners to work in our novel proposed framework. Experiments demonstrate that backbones producing meaningful, dense visual features, such as DINO, are key to achieving state-of-the-art performance in multiple region-based captioning tasks. Compared to other baselines and state-of-the-art competitors, our models achieve better performance on zero-shot dense, region-set, and a newly introduced trace captioning task, highlighting the effectiveness of patch-wise semantic representations for scalable caption generation. Project page at https://paciosoft.com/Patch-ioner/ .
PDF42October 13, 2025