Hiformer：基于Transformer的异构特征交互学习用于推荐系统

摘要

学习特征交互是构建推荐系统的关键支柱。在大规模网络应用中，学习特征交互非常具有挑战性，因为输入特征空间稀疏且庞大；同时，由于指数级的解空间，手动精心设计有效的特征交互是不可行的。我们提出利用基于注意力层的Transformer架构自动捕获特征交互。Transformer架构在许多领域取得了巨大成功，如自然语言处理和计算机视觉。然而，在工业界，Transformer架构在特征交互建模方面的应用并不多见。我们旨在弥合这一差距。我们确定将基础Transformer架构应用于大规模网络推荐系统存在两个关键挑战：（1）Transformer架构无法在自注意力层中捕获异构特征交互；（2）Transformer架构的服务延迟可能过高，无法部署在大规模网络推荐系统中。我们首先提出了异构自注意力层，这是对Transformer中自注意力层的简单而有效修改，以考虑特征交互的异构性。然后，我们引入Hiformer（Heterogeneous Interaction Transformer）来进一步提高模型的表达能力。通过低秩近似和模型修剪，Hiformer在在线部署中具有快速推断的优势。大量离线实验结果证实了Hiformer模型的有效性和效率。我们已成功将Hiformer模型部署到Google Play的实际大规模应用排名模型中，关键参与度指标显著提升（最高可达+2.66%）。

English

Learning feature interaction is the critical backbone to building recommender systems. In web-scale applications, learning feature interaction is extremely challenging due to the sparse and large input feature space; meanwhile, manually crafting effective feature interactions is infeasible because of the exponential solution space. We propose to leverage a Transformer-based architecture with attention layers to automatically capture feature interactions. Transformer architectures have witnessed great success in many domains, such as natural language processing and computer vision. However, there has not been much adoption of Transformer architecture for feature interaction modeling in industry. We aim at closing the gap. We identify two key challenges for applying the vanilla Transformer architecture to web-scale recommender systems: (1) Transformer architecture fails to capture the heterogeneous feature interactions in the self-attention layer; (2) The serving latency of Transformer architecture might be too high to be deployed in web-scale recommender systems. We first propose a heterogeneous self-attention layer, which is a simple yet effective modification to the self-attention layer in Transformer, to take into account the heterogeneity of feature interactions. We then introduce Hiformer (Heterogeneous Interaction Transformer) to further improve the model expressiveness. With low-rank approximation and model pruning, \hiformer enjoys fast inference for online deployment. Extensive offline experiment results corroborates the effectiveness and efficiency of the Hiformer model. We have successfully deployed the Hiformer model to a real world large scale App ranking model at Google Play, with significant improvement in key engagement metrics (up to +2.66\%).

Hiformer：基于Transformer的异构特征交互学习用于推荐系统

Hiformer: Heterogeneous Feature Interactions Learning with Transformers for Recommender Systems

摘要

Support