ChatPaper.aiChatPaper

基于大语言模型的生成式推荐中的隐式推理

Implicit Reasoning for Large Language Model-based Generative Recommendation

June 15, 2026
作者: Yinhan He, Liam Collins, Bhuvesh Kumar, Jundong Li, Neil Shah, Donald Loveland
cs.AI

摘要

大型语言模型(LLMs)正逐渐被用作生成式推荐(GR)的骨干架构,有望调用预训练的世界知识。然而,如何可靠地将这些知识用于GR仍缺乏深入理解。一个关键障碍在于,基于LLM的GR通常使用语义ID(SIDs)来表示物品,这破坏了LLM的自然语言推理接口——因为这些令牌在预训练阶段对LLM而言是未见过的。现有方法通过构建昂贵的多阶段流水线来锚定SIDs并生成显式推理依据,但无法深入揭示每个阶段何时及为何必要。本研究系统性地解构了基于LLM的GR的显式推理训练流程,揭示了三个关键局限:世界知识语言化能力减弱、SID与自然语言令牌嵌入空间的对齐错位、以及推理依据质量的敏感性——这些均损害显式推理性能。为规避这些问题,我们提出PauseRec——一种专为GR设计的轻量级隐式推理范式。PauseRec极具实用性,无需昂贵的推理轨迹获取和推理对齐训练,带来多重优势:(1)在性能上比标准显式思维链方法提升高达6.22%,(2)训练成本降低最多65%的GPU小时数,(3)推理速度提升高达71.3%。这些结果使PauseRec成为显式推理依据生成的轻量级替代方案,从而实现更高效、更有效的基于LLM的GR。
English
Large Language Models (LLMs) are increasingly adopted as backbones for Generative Recommendation (GR), promising access to pretrained world knowledge. Yet reliably invoking this knowledge for GR remains poorly understood. A key obstacle is that LLM-based GR typically represents items with Semantic IDs (SIDs), disrupting LLMs' natural-language reasoning interface because these tokens are unseen by the LLM during pretraining. Existing approaches address this with expensive multi-stage pipelines that ground SIDs and elicit explicit rationales, but offer limited insight into when and why each stage is necessary. In this work, we systematically decompose explicit reasoning training pipelines for LLM-based GR, revealing three key limitations: weakened world-knowledge verbalization, misalignment between SID and natural-language token embedding spaces, and sensitivity to rationale quality, all of which hurt explicit reasoning performance. To circumvent these issues, we propose PauseRec, a lightweight implicit reasoning paradigm tailored for GR. PauseRec is exceptionally practical, avoiding costly reasoning trace acquisition and reasoning alignment training, leading to a multitude of benefits: (1) it outperforms standard explicit CoT methods by up to 6.22%, (2) it reduces training cost by up to 65% GPU hours, and (3) it speeds up inference by up to 71.3%. These results position PauseRec as a lightweight alternative to explicit rationale generation, enabling more effective and efficient LLM-based GR.