悟空:朝向大规模推荐的扩展定律
Wukong: Towards a Scaling Law for Large-Scale Recommendation
March 4, 2024
作者: Buyun Zhang, Liang Luo, Yuxin Chen, Jade Nie, Xi Liu, Daifeng Guo, Yanli Zhao, Shen Li, Yuchen Hao, Yantao Yao, Guna Lakshminarayanan, Ellie Dingqiao Wen, Jongsoo Park, Maxim Naumov, Wenlin Chen
cs.AI
摘要
在模型品質的可持續改善中,規模律扮演著重要角色。不幸的是,迄今為止的推薦模型並未展現出類似於大型語言模型領域觀察到的規模律,這是由於它們的擴展機制效率不高所致。這種限制在將這些模型適應日益複雜的現實世界數據集時帶來了重大挑戰。在本文中,我們提出了一種基於純粹堆疊分解機制的有效網絡架構,以及一種協同擴展策略,統稱為Wukong,以在推薦領域建立規模律。Wukong的獨特設計使其能夠通過更高更寬的層簡單地捕捉各種任意次序的交互作用。我們在六個公共數據集上進行了廣泛評估,結果顯示Wukong在品質方面始終優於最先進的模型。此外,我們在一個內部的大規模數據集上評估了Wukong的可擴展性。結果顯示,Wukong在品質上保持優勢,同時在模型複雜度的兩個數量級範圍內保持規模律,超過100 Gflop,或者相當於GPT-3/LLaMa-2總訓練計算規模,優於先前的技術。
English
Scaling laws play an instrumental role in the sustainable improvement in
model quality. Unfortunately, recommendation models to date do not exhibit such
laws similar to those observed in the domain of large language models, due to
the inefficiencies of their upscaling mechanisms. This limitation poses
significant challenges in adapting these models to increasingly more complex
real-world datasets. In this paper, we propose an effective network
architecture based purely on stacked factorization machines, and a synergistic
upscaling strategy, collectively dubbed Wukong, to establish a scaling law in
the domain of recommendation. Wukong's unique design makes it possible to
capture diverse, any-order of interactions simply through taller and wider
layers. We conducted extensive evaluations on six public datasets, and our
results demonstrate that Wukong consistently outperforms state-of-the-art
models quality-wise. Further, we assessed Wukong's scalability on an internal,
large-scale dataset. The results show that Wukong retains its superiority in
quality over state-of-the-art models, while holding the scaling law across two
orders of magnitude in model complexity, extending beyond 100 Gflop or
equivalently up to GPT-3/LLaMa-2 scale of total training compute, where prior
arts fall short.