基於大型語言模型的生成式推薦中的隱式推理

摘要

大型語言模型（LLM）正逐漸被採用為生成式推薦（GR）的骨幹，承諾能取用預訓練的世界知識。然而，如何可靠地將此知識應用於 GR 仍缺乏深入理解。一個關鍵障礙在於，基於 LLM 的 GR 通常以語義標識符（SID）表示項目，這干擾了 LLM 的自然語言推理介面，因為這些詞彙在 LLM 預訓練期間未曾見過。現有方法透過昂貴的多階段流程來處理此問題，這些流程奠基於 SID 並引導出顯式推理依據，但對於各階段何時及為何必要所提供的見解有限。在本研究中，我們系統性地拆解基於 LLM 的 GR 之顯式推理訓練流程，揭示出三項關鍵限制：弱化的世界知識口語化能力、SID 與自然語言詞彙嵌入空間之間的錯位，以及對推理依據品質的敏感性，這些均損害了顯式推理效能。為規避這些問題，我們提出 PauseRec，一種專為 GR 量身打造的輕量級隱式推理典範。PauseRec 極其實用，避免了昂貴的推理跡獲取與推理對齊訓練，從而帶來多重效益：（1）其效能比標準顯式思維鏈方法高出最多 6.22%；（2）訓練成本減少最多 65% GPU 時數；（3）推理速度提升最多 71.3%。這些結果使 PauseRec 成為顯式推理依據生成之輕量級替代方案，得以實現更有效且更具效率的基於 LLM 之 GR。

English

Large Language Models (LLMs) are increasingly adopted as backbones for Generative Recommendation (GR), promising access to pretrained world knowledge. Yet reliably invoking this knowledge for GR remains poorly understood. A key obstacle is that LLM-based GR typically represents items with Semantic IDs (SIDs), disrupting LLMs' natural-language reasoning interface because these tokens are unseen by the LLM during pretraining. Existing approaches address this with expensive multi-stage pipelines that ground SIDs and elicit explicit rationales, but offer limited insight into when and why each stage is necessary. In this work, we systematically decompose explicit reasoning training pipelines for LLM-based GR, revealing three key limitations: weakened world-knowledge verbalization, misalignment between SID and natural-language token embedding spaces, and sensitivity to rationale quality, all of which hurt explicit reasoning performance. To circumvent these issues, we propose PauseRec, a lightweight implicit reasoning paradigm tailored for GR. PauseRec is exceptionally practical, avoiding costly reasoning trace acquisition and reasoning alignment training, leading to a multitude of benefits: (1) it outperforms standard explicit CoT methods by up to 6.22%, (2) it reduces training cost by up to 65% GPU hours, and (3) it speeds up inference by up to 71.3%. These results position PauseRec as a lightweight alternative to explicit rationale generation, enabling more effective and efficient LLM-based GR.