KAFA: 시각-언어 모델의 지식 증강 특징 적응을 통한 이미지 광고 이해 재고

초록

이미지 광고 이해는 다양한 실제 응용 분야에서 중요한 과제입니다. 비전문적인 장면, 실세계 개체, 그리고 장면 텍스트에 대한 추론 등이 포함되어 있어 매우 도전적이지만, 이미지 광고를 해석하는 방법은 특히 인상적인 일반화 능력과 적응성을 갖춘 기반 시각-언어 모델(VLMs) 시대에 상대적으로 덜 탐구되었습니다. 본 논문에서는 사전 학습된 VLMs의 관점에서 이미지 광고 이해에 대한 첫 번째 실증 연구를 수행합니다. 우리는 이러한 VLMs를 이미지 광고 이해에 적용하는 데 있어 실질적인 도전 과제를 벤치마킹하고 밝혀냅니다. 이미지 광고를 위한 다중 모드 정보를 효과적으로 융합하기 위한 간단한 특징 적응 전략을 제안하고, 실세계 개체에 대한 지식으로 이를 더욱 강화합니다. 우리의 연구가 광고 산업 전반에 광범위하게 관련된 이미지 광고 이해에 더 많은 관심을 끌기를 바랍니다.

English

Image ad understanding is a crucial task with wide real-world applications. Although highly challenging with the involvement of diverse atypical scenes, real-world entities, and reasoning over scene-texts, how to interpret image ads is relatively under-explored, especially in the era of foundational vision-language models (VLMs) featuring impressive generalizability and adaptability. In this paper, we perform the first empirical study of image ad understanding through the lens of pre-trained VLMs. We benchmark and reveal practical challenges in adapting these VLMs to image ad understanding. We propose a simple feature adaptation strategy to effectively fuse multimodal information for image ads and further empower it with knowledge of real-world entities. We hope our study draws more attention to image ad understanding which is broadly relevant to the advertising industry.

KAFA: 시각-언어 모델의 지식 증강 특징 적응을 통한 이미지 광고 이해 재고

KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models

초록

Support