生成的推薦システムはどの程度一般化できるのか？

要旨

生成的推薦（GR）モデルが従来のアイテムIDベースのモデルを凌駕する理由として広く受け入れられている仮説は、GRモデルの一般化能力の高さにある。しかし、この仮説を全体的な性能の表面的な比較を超えて体系的に検証する方法はほとんど存在しない。この課題を解決するため、我々は各データインスタンスを、正しい予測に必要とされる能力に基づいて分類した：記憶（学習時に観測されたアイテム遷移パターンの再利用）と、一般化（既知のパターンを組み合わせて未見のアイテム遷移を予測）である。大規模な実験により、GRモデルは一般化を必要とするインスタンスで優れた性能を発揮する一方、アイテムIDベースのモデルは記憶がより重要となる場合に優れていることが示された。この差異を説明するため、分析の焦点をアイテムレベルからトークンレベルに移し、GRモデルにおいてアイテムレベルの一般化に見える現象の多くが、実際にはトークンレベルの記憶に還元され得ることを示す。最後に、これら二つのパラダイムが相補的であることを示し、インスタンスごとにそれらを適応的に組み合わせる簡易な記憶考慮型指標を提案する。これにより、推薦性能の全体的な向上が達成される。

English

A widely held hypothesis for why generative recommendation (GR) models outperform conventional item ID-based models is that they generalize better. However, there is few systematic way to verify this hypothesis beyond a superficial comparison of overall performance. To address this gap, we categorize each data instance based on the specific capability required for a correct prediction: either memorization (reusing item transition patterns observed during training) or generalization (composing known patterns to predict unseen item transitions). Extensive experiments show that GR models perform better on instances that require generalization, whereas item ID-based models perform better when memorization is more important. To explain this divergence, we shift the analysis from the item level to the token level and show that what appears to be item-level generalization often reduces to token-level memorization for GR models. Finally, we show that the two paradigms are complementary. We propose a simple memorization-aware indicator that adaptively combines them on a per-instance basis, leading to improved overall recommendation performance.

生成的推薦システムはどの程度一般化できるのか？

How Well Does Generative Recommendation Generalize?

要旨

Support