大規模言語モデルに基づく生成的推薦のための暗黙的推論

要旨

大規模言語モデル（LLM）は、生成レコメンデーション（GR）の基盤としてますます採用されており、事前学習された世界知識へのアクセスを提供することが期待されている。しかし、この知識をGRのために確実に呼び出す方法については、まだ十分に理解されていない。主要な障害の一つは、LLMベースのGRが通常、セマンティックID（SID）を用いてアイテムを表現する点にある。これらのトークンはLLMの事前学習時には見られないため、LLMの自然言語推論インターフェースを阻害する。既存の手法では、SIDを接地し明示的な根拠を引き出す高コストな多段階パイプラインでこれに対処しているが、各段階がいつ、なぜ必要であるかについての限定的な洞察しか提供していない。本研究では、LLMベースGRのための明示的推論訓練パイプラインを体系的に分解し、以下の3つの主要な制約を明らかにする：世界知識の言語化の弱体化、SIDと自然語言語トークンの埋め込み空間間のミスアライメント、および根拠品質への感度であり、これらすべてが明示的推論性能を損なう。これらの問題を回避するために、GRに特化した軽量な暗黙的推論パラダイムであるPauseRecを提案する。PauseRecは非常に実用的であり、高コストな推論トレースの取得や推論アライメント訓練を回避できるため、以下の多くの利点をもたらす：(1) 標準的な明示的CoT手法と比較して最大6.22%の性能向上、(2) GPU時間で最大65%の訓練コスト削減、(3) 最大71.3%の推論高速化。これらの結果は、PauseRecが明示的根拠生成に代わる軽量な選択肢として位置づけられ、より効果的かつ効率的なLLMベースGRを実現することを示している。

English

Large Language Models (LLMs) are increasingly adopted as backbones for Generative Recommendation (GR), promising access to pretrained world knowledge. Yet reliably invoking this knowledge for GR remains poorly understood. A key obstacle is that LLM-based GR typically represents items with Semantic IDs (SIDs), disrupting LLMs' natural-language reasoning interface because these tokens are unseen by the LLM during pretraining. Existing approaches address this with expensive multi-stage pipelines that ground SIDs and elicit explicit rationales, but offer limited insight into when and why each stage is necessary. In this work, we systematically decompose explicit reasoning training pipelines for LLM-based GR, revealing three key limitations: weakened world-knowledge verbalization, misalignment between SID and natural-language token embedding spaces, and sensitivity to rationale quality, all of which hurt explicit reasoning performance. To circumvent these issues, we propose PauseRec, a lightweight implicit reasoning paradigm tailored for GR. PauseRec is exceptionally practical, avoiding costly reasoning trace acquisition and reasoning alignment training, leading to a multitude of benefits: (1) it outperforms standard explicit CoT methods by up to 6.22%, (2) it reduces training cost by up to 65% GPU hours, and (3) it speeds up inference by up to 71.3%. These results position PauseRec as a lightweight alternative to explicit rationale generation, enabling more effective and efficient LLM-based GR.