UniMixer：推荐系统中缩放定律的统一架构

摘要

近年来，推荐模型缩放规律（即推荐系统性能与参数量/计算量的关系）日益受到关注。目前实现推荐模型缩放的主流架构主要分为三类：基于注意力机制、基于TokenMixer和基于因子分解机的方法，这些方法在设计理念与架构结构上存在本质差异。本文提出一种统一的推荐系统缩放架构UniMixer，旨在提升缩放效率并建立融合主流缩放模块的统一理论框架。通过将基于规则的TokenMixer转换为等效的参数化结构，我们构建了广义参数化特征混合模块，使得令牌混合模式能在模型训练过程中被优化学习。同时，广义参数化令牌混合打破了TokenMixer中注意力头数必须等于令牌数的约束。此外，我们建立了推荐系统统一的缩放模块设计框架，连通了基于注意力机制、TokenMixer和因子分解机方法之间的内在联系。为提升缩放投资回报率，本文进一步设计轻量化UniMixing模块UniMixing-Lite，在显著提升模型性能的同时进一步压缩参数量和计算成本。缩放曲线如下图所示。大量离线与在线实验验证了UniMixer卓越的缩放能力。

English

In recent years, the scaling laws of recommendation models have attracted increasing attention, which govern the relationship between performance and parameters/FLOPs of recommenders. Currently, there are three mainstream architectures for achieving scaling in recommendation models, namely attention-based, TokenMixer-based, and factorization-machine-based methods, which exhibit fundamental differences in both design philosophy and architectural structure. In this paper, we propose a unified scaling architecture for recommendation systems, namely UniMixer, to improve scaling efficiency and establish a unified theoretical framework that unifies the mainstream scaling blocks. By transforming the rule-based TokenMixer to an equivalent parameterized structure, we construct a generalized parameterized feature mixing module that allows the token mixing patterns to be optimized and learned during model training. Meanwhile, the generalized parameterized token mixing removes the constraint in TokenMixer that requires the number of heads to be equal to the number of tokens. Furthermore, we establish a unified scaling module design framework for recommender systems, which bridges the connections among attention-based, TokenMixer-based, and factorization-machine-based methods. To further boost scaling ROI, a lightweight UniMixing module is designed, UniMixing-Lite, which further compresses the model parameters and computational cost while significantly improve the model performance. The scaling curves are shown in the following figure. Extensive offline and online experiments are conducted to verify the superior scaling abilities of UniMixer.

UniMixer：推荐系统中缩放定律的统一架构

UniMixer: A Unified Architecture for Scaling Laws in Recommendation Systems

摘要

Support