具有生成式检索的推荐系统
Recommender Systems with Generative Retrieval
May 8, 2023
作者: Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan H. Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Q. Tran, Jonah Samost, Maciej Kula, Ed H. Chi, Maheswaran Sathiamoorthy
cs.AI
摘要
现代推荐系统利用大规模检索模型,包括两个阶段:训练双编码器模型将查询和候选项嵌入到相同空间中,然后进行近似最近邻(ANN)搜索以选择给定查询嵌入的前候选项。在本文中,我们提出了一种新的单阶段范式:一种生成式检索模型,它自回归地解码目标候选项的标识符。为了实现这一点,我们不是为每个项目分配随机生成的原子ID,而是生成语义ID:每个项目的一个语义上有意义的代码词元组,作为其唯一标识符。我们使用一种称为RQ-VAE的分层方法来生成这些代码词。一旦我们获得了所有项目的语义ID,就会训练一个基于Transformer的序列到序列模型,以预测下一个项目的语义ID。由于这个模型以自回归方式直接预测标识下一个项目的代码词元组,因此可以被视为一种生成式检索模型。我们展示了在这种新范式下训练的推荐系统改善了在亚马逊数据集上当前SOTA模型取得的结果。此外,我们证明了序列到序列模型与分层语义ID相结合提供了更好的泛化能力,从而改善了对推荐的冷启动项目的检索。
English
Modern recommender systems leverage large-scale retrieval models consisting
of two stages: training a dual-encoder model to embed queries and candidates in
the same space, followed by an Approximate Nearest Neighbor (ANN) search to
select top candidates given a query's embedding. In this paper, we propose a
new single-stage paradigm: a generative retrieval model which autoregressively
decodes the identifiers for the target candidates in one phase. To do this,
instead of assigning randomly generated atomic IDs to each item, we generate
Semantic IDs: a semantically meaningful tuple of codewords for each item that
serves as its unique identifier. We use a hierarchical method called RQ-VAE to
generate these codewords. Once we have the Semantic IDs for all the items, a
Transformer based sequence-to-sequence model is trained to predict the Semantic
ID of the next item. Since this model predicts the tuple of codewords
identifying the next item directly in an autoregressive manner, it can be
considered a generative retrieval model. We show that our recommender system
trained in this new paradigm improves the results achieved by current SOTA
models on the Amazon dataset. Moreover, we demonstrate that the
sequence-to-sequence model coupled with hierarchical Semantic IDs offers better
generalization and hence improves retrieval of cold-start items for
recommendations.