生成検索を用いた推薦システム

要旨

現代のレコメンダーシステムは、大規模な検索モデルを活用しており、通常2段階のプロセスで構成されています。まず、クエリと候補を同じ空間に埋め込むためのデュアルエンコーダーモデルを訓練し、その後、クエリの埋め込みに基づいてトップ候補を選択するための近似最近傍探索（ANN）を行います。本論文では、新しい単一段階のパラダイムを提案します。それは、ターゲット候補の識別子を一連のフェーズで自己回帰的にデコードする生成型検索モデルです。これを行うために、各アイテムにランダムに生成された原子IDを割り当てる代わりに、セマンティックIDを生成します。セマンティックIDは、各アイテムの一意の識別子として機能する、意味的に意味のあるコードワードのタプルです。これらのコードワードを生成するために、RQ-VAEと呼ばれる階層的手法を使用します。すべてのアイテムのセマンティックIDを取得した後、Transformerベースのシーケンス・ツー・シーケンスモデルを訓練して、次のアイテムのセマンティックIDを予測します。このモデルは、次のアイテムを識別するコードワードのタプルを自己回帰的に直接予測するため、生成型検索モデルと見なすことができます。私たちは、この新しいパラダイムで訓練されたレコメンダーシステムが、Amazonデータセットにおいて現在のSOTAモデルが達成した結果を改善することを示します。さらに、階層的なセマンティックIDと組み合わせたシーケンス・ツー・シーケンスモデルが、より良い一般化を提供し、その結果、レコメンデーションにおけるコールドスタートアイテムの検索を改善することを実証します。

English

Modern recommender systems leverage large-scale retrieval models consisting of two stages: training a dual-encoder model to embed queries and candidates in the same space, followed by an Approximate Nearest Neighbor (ANN) search to select top candidates given a query's embedding. In this paper, we propose a new single-stage paradigm: a generative retrieval model which autoregressively decodes the identifiers for the target candidates in one phase. To do this, instead of assigning randomly generated atomic IDs to each item, we generate Semantic IDs: a semantically meaningful tuple of codewords for each item that serves as its unique identifier. We use a hierarchical method called RQ-VAE to generate these codewords. Once we have the Semantic IDs for all the items, a Transformer based sequence-to-sequence model is trained to predict the Semantic ID of the next item. Since this model predicts the tuple of codewords identifying the next item directly in an autoregressive manner, it can be considered a generative retrieval model. We show that our recommender system trained in this new paradigm improves the results achieved by current SOTA models on the Amazon dataset. Moreover, we demonstrate that the sequence-to-sequence model coupled with hierarchical Semantic IDs offers better generalization and hence improves retrieval of cold-start items for recommendations.

生成検索を用いた推薦システム

Recommender Systems with Generative Retrieval

要旨

Support