ChatPaper.aiChatPaper

跨模态匹配实现的自动创意选择

Automatic Creative Selection with Cross-Modal Matching

February 28, 2024
作者: Alex Kim, Jia Huang, Rob Monarch, Jerry Kwac, Anikesh Kamath, Parmeshwar Khurd, Kailash Thiyagarajan, Goodman Gu
cs.AI

摘要

应用开发人员通过创建产品页面并投标搜索词来宣传他们的应用程序。因此,对于应用程序图像与搜索词高度相关至关重要。解决这一问题的方案需要一个图像文本匹配模型,用于预测所选图像与搜索词之间匹配的质量。在这项工作中,我们提出了一种新颖的方法,基于微调预训练的LXMERT模型来匹配应用程序图像与搜索词。我们展示相较于CLIP模型以及使用Transformer模型用于搜索词和ResNet模型用于图像的基准,我们显著提高了匹配准确性。我们使用两组标签评估我们的方法:广告商关联的(图像,搜索词)对于特定应用程序,以及人类对(图像,搜索词)对之间相关性的评分。我们的方法在广告商关联的真实数据上实现了0.96的AUC分数,优于Transformer+ResNet基准和经微调的CLIP模型分别提高了8%和14%。对于人工标记的真实数据,我们的方法实现了0.95的AUC分数,优于Transformer+ResNet基准和经微调的CLIP模型分别提高了16%和17%。
English
Application developers advertise their Apps by creating product pages with App images, and bidding on search terms. It is then crucial for App images to be highly relevant with the search terms. Solutions to this problem require an image-text matching model to predict the quality of the match between the chosen image and the search terms. In this work, we present a novel approach to matching an App image to search terms based on fine-tuning a pre-trained LXMERT model. We show that compared to the CLIP model and a baseline using a Transformer model for search terms, and a ResNet model for images, we significantly improve the matching accuracy. We evaluate our approach using two sets of labels: advertiser associated (image, search term) pairs for a given application, and human ratings for the relevance between (image, search term) pairs. Our approach achieves 0.96 AUC score for advertiser associated ground truth, outperforming the transformer+ResNet baseline and the fine-tuned CLIP model by 8% and 14%. For human labeled ground truth, our approach achieves 0.95 AUC score, outperforming the transformer+ResNet baseline and the fine-tuned CLIP model by 16% and 17%.

Summary

AI-Generated Summary

PDF91December 15, 2024