擴大LLM對Google廣告內容審查的應用

摘要

大型語言模型（LLMs）是內容審查的強大工具，但其推論成本和延遲使它們在大型數據集上的非正式使用變得困難，例如Google廣告存儲庫。本研究提出了一種方法，用於在Google廣告中擴展LLM審查以進行內容審查。首先，我們使用啟發式方法通過過濾和去重來選擇候選廣告，並為這些廣告創建廣告群集，從中選擇一個代表性廣告。然後，我們使用LLMs僅審查代表性廣告。最後，我們將代表性廣告的LLM決策傳播回它們的群集。這種方法將審查數量減少了超過3個量級，同時與基準非LLM模型相比，實現了2倍的召回率。這種方法的成功與用於聚類和標籤傳播的表示的功能密切相關；我們發現跨模態相似性表示比單模態表示產生更好的結果。

English

Large language models (LLMs) are powerful tools for content moderation, but their inference costs and latency make them prohibitive for casual use on large datasets, such as the Google Ads repository. This study proposes a method for scaling up LLM reviews for content moderation in Google Ads. First, we use heuristics to select candidates via filtering and duplicate removal, and create clusters of ads for which we select one representative ad per cluster. We then use LLMs to review only the representative ads. Finally, we propagate the LLM decisions for the representative ads back to their clusters. This method reduces the number of reviews by more than 3 orders of magnitude while achieving a 2x recall compared to a baseline non-LLM model. The success of this approach is a strong function of the representations used in clustering and label propagation; we found that cross-modal similarity representations yield better results than uni-modal representations.

擴大LLM對Google廣告內容審查的應用

Scaling Up LLM Reviews for Google Ads Content Moderation

摘要

Support