Hiformer: 推薦システムのためのTransformerを用いた異種特徴インタラクション学習

要旨

特徴量間の相互作用を学習することは、推薦システムを構築する上で重要な基盤です。ウェブスケールのアプリケーションでは、入力特徴量空間が疎で大規模であるため、特徴量間の相互作用を学習することは極めて困難です。一方で、手作業で効果的な特徴量間の相互作用を設計することは、解空間が指数的に大きいため非現実的です。我々は、Transformerベースのアーキテクチャとアテンションレイヤーを活用して、特徴量間の相互作用を自動的に捉えることを提案します。Transformerアーキテクチャは、自然言語処理やコンピュータビジョンなど多くの分野で大きな成功を収めています。しかし、産業界では特徴量間の相互作用モデリングにTransformerアーキテクチャを採用する例はあまりありません。我々はこのギャップを埋めることを目指しています。ウェブスケールの推薦システムに標準的なTransformerアーキテクチャを適用する際の2つの主要な課題を特定しました：(1) Transformerアーキテクチャは、セルフアテンションレイヤーで異種の特徴量間の相互作用を捉えることができない、(2) Transformerアーキテクチャのサービングレイテンシが高すぎて、ウェブスケールの推薦システムに展開できない。まず、特徴量間の相互作用の異質性を考慮するために、Transformerのセルフアテンションレイヤーをシンプルかつ効果的に修正した異種セルフアテンションレイヤーを提案します。次に、モデルの表現力をさらに向上させるためにHiformer（Heterogeneous Interaction Transformer）を導入します。低ランク近似とモデルプルーニングにより、Hiformerはオンライン展開のための高速な推論を実現します。大規模なオフライン実験結果は、Hiformerモデルの有効性と効率性を裏付けています。我々は、HiformerモデルをGoogle Playの大規模なアプリランキングモデルに実際に展開し、主要なエンゲージメント指標で大幅な改善（最大+2.66%）を達成しました。

English

Learning feature interaction is the critical backbone to building recommender systems. In web-scale applications, learning feature interaction is extremely challenging due to the sparse and large input feature space; meanwhile, manually crafting effective feature interactions is infeasible because of the exponential solution space. We propose to leverage a Transformer-based architecture with attention layers to automatically capture feature interactions. Transformer architectures have witnessed great success in many domains, such as natural language processing and computer vision. However, there has not been much adoption of Transformer architecture for feature interaction modeling in industry. We aim at closing the gap. We identify two key challenges for applying the vanilla Transformer architecture to web-scale recommender systems: (1) Transformer architecture fails to capture the heterogeneous feature interactions in the self-attention layer; (2) The serving latency of Transformer architecture might be too high to be deployed in web-scale recommender systems. We first propose a heterogeneous self-attention layer, which is a simple yet effective modification to the self-attention layer in Transformer, to take into account the heterogeneity of feature interactions. We then introduce Hiformer (Heterogeneous Interaction Transformer) to further improve the model expressiveness. With low-rank approximation and model pruning, \hiformer enjoys fast inference for online deployment. Extensive offline experiment results corroborates the effectiveness and efficiency of the Hiformer model. We have successfully deployed the Hiformer model to a real world large scale App ranking model at Google Play, with significant improvement in key engagement metrics (up to +2.66\%).

Hiformer: 推薦システムのためのTransformerを用いた異種特徴インタラクション学習

Hiformer: Heterogeneous Feature Interactions Learning with Transformers for Recommender Systems

要旨

Support