ChatPaper.aiChatPaper

ActionPiece:基於上下文的行動序列分詞技術於生成式推薦系統

ActionPiece: Contextually Tokenizing Action Sequences for Generative Recommendation

February 19, 2025
作者: Yupeng Hou, Jianmo Ni, Zhankui He, Noveen Sachdeva, Wang-Cheng Kang, Ed H. Chi, Julian McAuley, Derek Zhiyuan Cheng
cs.AI

摘要

生成式推薦(Generative Recommendation, GR)是一種新興的範式,其中用戶行為被轉化為離散的標記模式,並以自回歸方式生成預測。然而,現有的GR模型獨立地對每個行為進行標記化,為所有序列中的相同行為分配相同的固定標記,而忽略了上下文關係。這種缺乏上下文感知的處理方式可能導致次優的表現,因為相同的行為在不同的上下文環境中可能具有不同的意義。為了解決這一問題,我們提出了ActionPiece,在標記化行為序列時顯式地融入上下文信息。在ActionPiece中,每個行為被表示為一組項目特徵,這些特徵作為初始標記。基於行為序列語料庫,我們通過合併特徵模式來構建詞彙表,這些新模式作為新標記,其合併依據是它們在單個集合內及相鄰集合間的共現頻率。考慮到特徵集合的無序性,我們進一步引入了集合排列正則化,這能生成具有相同語義的行為序列的多種分割方式。在公開數據集上的實驗表明,ActionPiece在現有的行為標記化方法中持續表現優異,將NDCG@10提升了6.00%至12.82%。
English
Generative recommendation (GR) is an emerging paradigm where user actions are tokenized into discrete token patterns and autoregressively generated as predictions. However, existing GR models tokenize each action independently, assigning the same fixed tokens to identical actions across all sequences without considering contextual relationships. This lack of context-awareness can lead to suboptimal performance, as the same action may hold different meanings depending on its surrounding context. To address this issue, we propose ActionPiece to explicitly incorporate context when tokenizing action sequences. In ActionPiece, each action is represented as a set of item features, which serve as the initial tokens. Given the action sequence corpora, we construct the vocabulary by merging feature patterns as new tokens, based on their co-occurrence frequency both within individual sets and across adjacent sets. Considering the unordered nature of feature sets, we further introduce set permutation regularization, which produces multiple segmentations of action sequences with the same semantics. Experiments on public datasets demonstrate that ActionPiece consistently outperforms existing action tokenization methods, improving NDCG@10 by 6.00% to 12.82%.

Summary

AI-Generated Summary

PDF53February 20, 2025