生成式推荐系统的泛化能力如何?
How Well Does Generative Recommendation Generalize?
March 20, 2026
作者: Yijie Ding, Zitian Guo, Jiacheng Li, Letian Peng, Shuai Shao, Wei Shao, Xiaoqiang Luo, Luke Simon, Jingbo Shang, Julian McAuley, Yupeng Hou
cs.AI
摘要
关于生成式推荐模型为何优于传统基于物品ID的模型,一个普遍假设是前者具有更强的泛化能力。然而除粗略的整体性能比较外,目前缺乏系统性方法来验证这一假设。为解决这一空白,我们根据正确预测所需的具体能力对数据实例进行分类:记忆(复用训练中观察到的物品转移模式)或泛化(组合已知模式以预测未见过的物品转移)。大量实验表明,生成式推荐模型在需要泛化的实例上表现更佳,而基于物品ID的模型在记忆更重要时表现更好。为解释这种差异,我们将分析从物品层面转向标记层面,揭示生成式推荐模型中看似物品层面的泛化往往可归结为标记层面的记忆。最后我们证明这两种范式具有互补性,并提出一种简单的记忆感知指标,在实例层面自适应地结合二者,从而提升整体推荐性能。
English
A widely held hypothesis for why generative recommendation (GR) models outperform conventional item ID-based models is that they generalize better. However, there is few systematic way to verify this hypothesis beyond a superficial comparison of overall performance. To address this gap, we categorize each data instance based on the specific capability required for a correct prediction: either memorization (reusing item transition patterns observed during training) or generalization (composing known patterns to predict unseen item transitions). Extensive experiments show that GR models perform better on instances that require generalization, whereas item ID-based models perform better when memorization is more important. To explain this divergence, we shift the analysis from the item level to the token level and show that what appears to be item-level generalization often reduces to token-level memorization for GR models. Finally, we show that the two paradigms are complementary. We propose a simple memorization-aware indicator that adaptively combines them on a per-instance basis, leading to improved overall recommendation performance.