HyTRec：一种面向长行为序列推荐的混合时序感知注意力架构

摘要

对用户长序列行为进行建模已成为生成式推荐领域的关键前沿。现有解决方案面临两难困境：线性注意力机制受限于状态容量，虽能提升效率却牺牲了检索精度；而Softmax注意力则存在难以承受的计算开销。为解决这一挑战，我们提出HyTRec模型，其混合注意力架构显式解耦长期稳定偏好与短期意图波动。通过将海量历史序列分配给线性注意力分支，同时为近期交互保留专用Softmax注意力分支，我们的方法在涉及万级交互的工业级场景中恢复了精准检索能力。为缓解线性层捕捉快速兴趣漂移的滞后性，我们进一步设计时序感知增量网络（TADN），动态增强新鲜行为信号的权重并有效抑制历史噪声。工业级数据集上的实证结果表明，我们的模型在保持线性推理速度的同时显著超越强基线模型，对超长序列用户实现命中率超8%的提升，且具备卓越效率。

English

Modeling long sequences of user behaviors has emerged as a critical frontier in generative recommendation. However, existing solutions face a dilemma: linear attention mechanisms achieve efficiency at the cost of retrieval precision due to limited state capacity, while softmax attention suffers from prohibitive computational overhead. To address this challenge, we propose HyTRec, a model featuring a Hybrid Attention architecture that explicitly decouples long-term stable preferences from short-term intent spikes. By assigning massive historical sequences to a linear attention branch and reserving a specialized softmax attention branch for recent interactions, our approach restores precise retrieval capabilities within industrial-scale contexts involving ten thousand interactions. To mitigate the lag in capturing rapid interest drifts within the linear layers, we furthermore design Temporal-Aware Delta Network (TADN) to dynamically upweight fresh behavioral signals while effectively suppressing historical noise. Empirical results on industrial-scale datasets confirm the superiority that our model maintains linear inference speed and outperforms strong baselines, notably delivering over 8% improvement in Hit Rate for users with ultra-long sequences with great efficiency.