Hiformer：基於Transformer的異質特徵交互學習用於推薦系統

摘要

學習特徵交互作用是建立推薦系統的關鍵基礎。在 Web 規模應用中，由於稀疏且龐大的輸入特徵空間，學習特徵交互作用是非常具挑戰性的；與此同時，由於指數級的解決空間，手動創建有效的特徵交互作用是不可行的。我們提議利用基於 Transformer 架構的注意力層來自動捕捉特徵交互作用。Transformer 架構在許多領域取得了巨大成功，如自然語言處理和計算機視覺。然而，在工業界中，Transformer 架構在特徵交互作用建模方面並未得到廣泛應用。我們的目標是彌合這一差距。我們確定將基本的 Transformer 架構應用於 Web 規模推薦系統面臨兩個關鍵挑戰：(1) Transformer 架構無法在自注意力層中捕捉異構特徵交互作用；(2) Transformer 架構的服務延遲可能過高，無法部署在 Web 規模推薦系統中。我們首先提出了異構自注意力層，這是對 Transformer 中自注意力層的一個簡單而有效的修改，以考慮特徵交互作用的異構性。然後引入 Hiformer（異構交互 Transformer）以進一步提高模型的表達能力。通過低秩近似和模型修剪，Hiformer 可以在線部署中享有快速推論。大量離線實驗結果證實了 Hiformer 模型的有效性和效率。我們已成功將 Hiformer 模型部署到 Google Play 的真實大規模應用排名模型中，關鍵參與度指標顯著提升（最高達 +2.66%）。

English

Learning feature interaction is the critical backbone to building recommender systems. In web-scale applications, learning feature interaction is extremely challenging due to the sparse and large input feature space; meanwhile, manually crafting effective feature interactions is infeasible because of the exponential solution space. We propose to leverage a Transformer-based architecture with attention layers to automatically capture feature interactions. Transformer architectures have witnessed great success in many domains, such as natural language processing and computer vision. However, there has not been much adoption of Transformer architecture for feature interaction modeling in industry. We aim at closing the gap. We identify two key challenges for applying the vanilla Transformer architecture to web-scale recommender systems: (1) Transformer architecture fails to capture the heterogeneous feature interactions in the self-attention layer; (2) The serving latency of Transformer architecture might be too high to be deployed in web-scale recommender systems. We first propose a heterogeneous self-attention layer, which is a simple yet effective modification to the self-attention layer in Transformer, to take into account the heterogeneity of feature interactions. We then introduce Hiformer (Heterogeneous Interaction Transformer) to further improve the model expressiveness. With low-rank approximation and model pruning, \hiformer enjoys fast inference for online deployment. Extensive offline experiment results corroborates the effectiveness and efficiency of the Hiformer model. We have successfully deployed the Hiformer model to a real world large scale App ranking model at Google Play, with significant improvement in key engagement metrics (up to +2.66\%).

Hiformer：基於Transformer的異質特徵交互學習用於推薦系統

Hiformer: Heterogeneous Feature Interactions Learning with Transformers for Recommender Systems

摘要

Support