PaliGemma:一個多功能的3B VLM用於轉移
PaliGemma: A versatile 3B VLM for transfer
July 10, 2024
作者: Lucas Beyer, Andreas Steiner, André Susano Pinto, Alexander Kolesnikov, Xiao Wang, Daniel Salz, Maxim Neumann, Ibrahim Alabdulmohsin, Michael Tschannen, Emanuele Bugliarello, Thomas Unterthiner, Daniel Keysers, Skanda Koppula, Fangyu Liu, Adam Grycner, Alexey Gritsenko, Neil Houlsby, Manoj Kumar, Keran Rong, Julian Eisenschlos, Rishabh Kabra, Matthias Bauer, Matko Bošnjak, Xi Chen, Matthias Minderer, Paul Voigtlaender, Ioana Bica, Ivana Balazevic, Joan Puigcerver, Pinelopi Papalampidi, Olivier Henaff, Xi Xiong, Radu Soricut, Jeremiah Harmsen, Xiaohua Zhai
cs.AI
摘要
PaliGemma是一個基於SigLIP-So400m視覺編碼器和Gemma-2B語言模型的開放式視覺語言模型(VLM)。它經過訓練,成為一個多功能且廣泛知識的基礎模型,非常適合進行轉移。在各種開放世界任務上表現出色。我們對PaliGemma進行了近40個不同任務的評估,包括標準VLM基準測試,以及更專門的任務,如遙感和分割任務。
English
PaliGemma is an open Vision-Language Model (VLM) that is based on the
SigLIP-So400m vision encoder and the Gemma-2B language model. It is trained to
be a versatile and broadly knowledgeable base model that is effective to
transfer. It achieves strong performance on a wide variety of open-world tasks.
We evaluate PaliGemma on almost 40 diverse tasks including standard VLM
benchmarks, but also more specialized tasks such as remote-sensing and
segmentation.Summary
AI-Generated Summary