구글 광고 콘텐츠 검수를 위한 대규모 언어 모델 리뷰 확장

초록

대형 언어 모델(LLM)은 콘텐츠 조정을 위한 강력한 도구이지만, 추론 비용과 지연 시간으로 인해 Google Ads 저장소와 같은 대규모 데이터셋에 대한 일상적인 사용에는 적합하지 않습니다. 본 연구에서는 Google Ads의 콘텐츠 조정을 위해 LLM 검토를 확장하는 방법을 제안합니다. 먼저, 휴리스틱을 사용하여 필터링 및 중복 제거를 통해 후보를 선정하고, 클러스터 내에서 대표 광고를 하나씩 선택합니다. 그런 다음, LLM을 사용하여 대표 광고만 검토합니다. 마지막으로, 대표 광고에 대한 LLM의 결정을 해당 클러스터로 전파합니다. 이 방법은 검토 횟수를 3자릿수 이상 줄이면서도 비-LLM 기준 모델 대비 2배의 재현율을 달성합니다. 이 접근법의 성공은 클러스터링 및 레이블 전파에 사용된 표현에 크게 의존하며, 교차 모달 유사성 표현이 단일 모달 표현보다 더 나은 결과를 제공한다는 것을 발견했습니다.

English

Large language models (LLMs) are powerful tools for content moderation, but their inference costs and latency make them prohibitive for casual use on large datasets, such as the Google Ads repository. This study proposes a method for scaling up LLM reviews for content moderation in Google Ads. First, we use heuristics to select candidates via filtering and duplicate removal, and create clusters of ads for which we select one representative ad per cluster. We then use LLMs to review only the representative ads. Finally, we propagate the LLM decisions for the representative ads back to their clusters. This method reduces the number of reviews by more than 3 orders of magnitude while achieving a 2x recall compared to a baseline non-LLM model. The success of this approach is a strong function of the representations used in clustering and label propagation; we found that cross-modal similarity representations yield better results than uni-modal representations.

구글 광고 콘텐츠 검수를 위한 대규모 언어 모델 리뷰 확장

Scaling Up LLM Reviews for Google Ads Content Moderation

초록

Support