대규모 언어 모델 기반 생성형 추천을 위한 암시적 추론

초록

대규모 언어 모델(LLM)은 생성형 추천(GR)의 백본으로 점차 채택되고 있으며, 사전 학습된 세계 지식에 대한 접근을 약속한다. 그러나 GR을 위해 이 지식을 신뢰성 있게 활용하는 방법은 여전히 잘 이해되지 않고 있다. 주요 장애물은 LLM 기반 GR이 일반적으로 의미론적 ID(SID)로 항목을 표현하는데, 이는 사전 학습 중에 LLM이 본 적 없는 토큰이기 때문에 LLM의 자연어 추론 인터페이스를 방해한다는 점이다. 기존 접근법은 SID를 정립하고 명시적 추론을 유도하는 값비싼 다단계 파이프라인으로 이를 해결하지만, 각 단계가 언제 그리고 왜 필요한지에 대한 통찰은 제한적으로 제공한다. 본 연구에서는 LLM 기반 GR을 위한 명시적 추론 훈련 파이프라인을 체계적으로 분해하여, 세 가지 주요 한계점, 즉 약화된 세계 지식 언어화, SID와 자연어 토큰 임베딩 공간 간의 정렬 불일치, 추론 품질에 대한 민감성을 밝혀내며, 이 모두가 명시적 추론 성능을 저해한다. 이러한 문제를 우회하기 위해, 우리는 GR에 특화된 경량 암시적 추론 패러다임인 PauseRec을 제안한다. PauseRec은 매우 실용적이며, 비용이 많이 드는 추론 과정 획득 및 추론 정렬 훈련을 피함으로써 다음과 같은 여러 이점을 제공한다: (1) 표준 명시적 CoT 방법보다 최대 6.22% 향상된 성능, (2) GPU 시간 기준 최대 65%의 훈련 비용 절감, (3) 최대 71.3%의 추론 속도 향상. 이러한 결과는 PauseRec을 명시적 추론 생성의 경량 대안으로 자리매김하게 하며, 더 효과적이고 효율적인 LLM 기반 GR을 가능하게 한다.

English

Large Language Models (LLMs) are increasingly adopted as backbones for Generative Recommendation (GR), promising access to pretrained world knowledge. Yet reliably invoking this knowledge for GR remains poorly understood. A key obstacle is that LLM-based GR typically represents items with Semantic IDs (SIDs), disrupting LLMs' natural-language reasoning interface because these tokens are unseen by the LLM during pretraining. Existing approaches address this with expensive multi-stage pipelines that ground SIDs and elicit explicit rationales, but offer limited insight into when and why each stage is necessary. In this work, we systematically decompose explicit reasoning training pipelines for LLM-based GR, revealing three key limitations: weakened world-knowledge verbalization, misalignment between SID and natural-language token embedding spaces, and sensitivity to rationale quality, all of which hurt explicit reasoning performance. To circumvent these issues, we propose PauseRec, a lightweight implicit reasoning paradigm tailored for GR. PauseRec is exceptionally practical, avoiding costly reasoning trace acquisition and reasoning alignment training, leading to a multitude of benefits: (1) it outperforms standard explicit CoT methods by up to 6.22%, (2) it reduces training cost by up to 65% GPU hours, and (3) it speeds up inference by up to 71.3%. These results position PauseRec as a lightweight alternative to explicit rationale generation, enabling more effective and efficient LLM-based GR.