ChatPaper.aiChatPaper

跨模態匹配的自動創意選擇

Automatic Creative Selection with Cross-Modal Matching

February 28, 2024
作者: Alex Kim, Jia Huang, Rob Monarch, Jerry Kwac, Anikesh Kamath, Parmeshwar Khurd, Kailash Thiyagarajan, Goodman Gu
cs.AI

摘要

應用程式開發人員通過建立產品頁面並競標搜索關鍵詞來宣傳他們的應用程式。因此,應用程式圖像與搜索關鍵詞高度相關至關重要。解決這個問題的方案需要一個圖像-文本匹配模型來預測所選圖像與搜索關鍵詞之間匹配的質量。在這項工作中,我們提出了一種新穎的方法,根據對預先訓練的LXMERT模型進行微調,來將應用程式圖像與搜索關鍵詞進行匹配。我們展示相對於CLIP模型以及一個使用Transformer模型用於搜索關鍵詞,以及使用ResNet模型用於圖像的基準線,我們顯著提高了匹配準確性。我們使用兩組標籤來評估我們的方法:廣告商關聯的(圖像,搜索關鍵詞)對應於特定應用程式,以及人類對(圖像,搜索關鍵詞)對之間相關性的評分。我們的方法在廣告商關聯的真實數據上實現了0.96的AUC分數,優於Transformer+ResNet基準線和微調的CLIP模型分別達到8%和14%。對於人類標記的真實數據,我們的方法實現了0.95的AUC分數,優於Transformer+ResNet基準線和微調的CLIP模型分別達到16%和17%。
English
Application developers advertise their Apps by creating product pages with App images, and bidding on search terms. It is then crucial for App images to be highly relevant with the search terms. Solutions to this problem require an image-text matching model to predict the quality of the match between the chosen image and the search terms. In this work, we present a novel approach to matching an App image to search terms based on fine-tuning a pre-trained LXMERT model. We show that compared to the CLIP model and a baseline using a Transformer model for search terms, and a ResNet model for images, we significantly improve the matching accuracy. We evaluate our approach using two sets of labels: advertiser associated (image, search term) pairs for a given application, and human ratings for the relevance between (image, search term) pairs. Our approach achieves 0.96 AUC score for advertiser associated ground truth, outperforming the transformer+ResNet baseline and the fine-tuned CLIP model by 8% and 14%. For human labeled ground truth, our approach achieves 0.95 AUC score, outperforming the transformer+ResNet baseline and the fine-tuned CLIP model by 16% and 17%.

Summary

AI-Generated Summary

PDF91December 15, 2024