ActionPiece: 생성적 추천을 위한 상황 기반 액션 시퀀스 토큰화

초록

생성적 추천(Generative Recommendation, GR)은 사용자 행동을 이산적인 토큰 패턴으로 토큰화하고 이를 자동회귀적으로 예측값으로 생성하는 새로운 패러다임입니다. 그러나 기존의 GR 모델들은 각 행동을 독립적으로 토큰화하여, 모든 시퀀스에서 동일한 행동에 대해 고정된 동일한 토큰을 할당함으로써 문맥적 관계를 고려하지 않습니다. 이러한 문맥 인식의 부재는 동일한 행동이 주변 문맥에 따라 다른 의미를 가질 수 있기 때문에 최적이 아닌 성능을 초래할 수 있습니다. 이 문제를 해결하기 위해, 우리는 행동 시퀀스를 토큰화할 때 문맥을 명시적으로 통합하는 ActionPiece를 제안합니다. ActionPiece에서는 각 행동이 아이템 특성들의 집합으로 표현되며, 이들이 초기 토큰으로 사용됩니다. 행동 시퀀스 코퍼스가 주어지면, 우리는 개별 집합 내에서와 인접한 집합 간의 동시 발생 빈도를 기반으로 특성 패턴들을 병합하여 새로운 토큰으로 어휘를 구성합니다. 특성 집합의 비순서적 특성을 고려하여, 우리는 동일한 의미를 가지는 행동 시퀀스의 다중 분할을 생성하는 집합 순열 정규화(set permutation regularization)를 추가로 도입합니다. 공개 데이터셋에서의 실험 결과, ActionPiece는 기존의 행동 토큰화 방법들을 일관되게 능가하며, NDCG@10을 6.00%에서 12.82%까지 향상시킴을 보여줍니다.

English

Generative recommendation (GR) is an emerging paradigm where user actions are tokenized into discrete token patterns and autoregressively generated as predictions. However, existing GR models tokenize each action independently, assigning the same fixed tokens to identical actions across all sequences without considering contextual relationships. This lack of context-awareness can lead to suboptimal performance, as the same action may hold different meanings depending on its surrounding context. To address this issue, we propose ActionPiece to explicitly incorporate context when tokenizing action sequences. In ActionPiece, each action is represented as a set of item features, which serve as the initial tokens. Given the action sequence corpora, we construct the vocabulary by merging feature patterns as new tokens, based on their co-occurrence frequency both within individual sets and across adjacent sets. Considering the unordered nature of feature sets, we further introduce set permutation regularization, which produces multiple segmentations of action sequences with the same semantics. Experiments on public datasets demonstrate that ActionPiece consistently outperforms existing action tokenization methods, improving NDCG@10 by 6.00% to 12.82%.