PaliGemma: un versatile modello VLM da 3B per il transfer learning

Abstract

PaliGemma è un modello visione-linguaggio (VLM) open source basato sull'encoder visivo SigLIP-So400m e sul modello linguistico Gemma-2B. È addestrato per essere un modello di base versatile e ampiamente competente, efficace per il trasferimento di conoscenze. Raggiunge prestazioni solide in una vasta gamma di compiti nel mondo reale. Valutiamo PaliGemma su quasi 40 task diversi, inclusi benchmark standard per VLM, ma anche compiti più specializzati come il telerilevamento e la segmentazione.

English

PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model. It is trained to be a versatile and broadly knowledgeable base model that is effective to transfer. It achieves strong performance on a wide variety of open-world tasks. We evaluate PaliGemma on almost 40 diverse tasks including standard VLM benchmarks, but also more specialized tasks such as remote-sensing and segmentation.

PaliGemma: un versatile modello VLM da 3B per il transfer learning

PaliGemma: A versatile 3B VLM for transfer

Abstract

Support