会話型レコメンデーションのためのアイテム言語モデル

要旨

大規模言語モデル（LLMs）は、その創発的な能力により、複雑な対話理解、推論、コーディングなどのタスクで非常に成功を収めています。これらの創発的な能力は、マルチモーダリティを拡張することで、画像、音声、動画の処理能力を含むようになりました。一方、レコメンダーシステムは、情報探索やアイテム発見のニーズにおいて重要な役割を果たしてきました。最近では、LLMsをレコメンデーションに適用する試みが行われています。しかし、現在の試みにおける課題の一つは、基盤となるLLMが通常、レコメンダーシステムのデータ（主にユーザーインタラクションの信号を含む）で訓練されておらず、そのデータが公開されていないことが多い点です。もう一つの課題は、ユーザーインタラクションの信号が自然言語テキストとは異なるパターンを持つことが多く、LLMの訓練設定が、従来のレコメンダーシステム手法と比較して、インタラクション信号からより非自明な知識を学習できるかどうかが現時点では不明確である点です。最後に、異なるユースケースのために複数のLLMを訓練し、レコメンダーシステムのデータから学習する際に元の言語能力と推論能力を保持することが難しい点です。これら3つの制約に対処するため、我々はItem-Language Model（ILM）を提案します。ILMは、ユーザーインタラクション信号をエンコードするテキスト整合アイテム表現を生成するアイテムエンコーダと、事前訓練された知識を保持したままそれらのアイテム表現を理解できる凍結されたLLMで構成されます。我々は、言語整合の重要性とアイテムエンコーダにおけるユーザーインタラクション知識の重要性を実証するための広範な実験を行いました。

English

Large-language Models (LLMs) have been extremely successful at tasks like complex dialogue understanding, reasoning and coding due to their emergent abilities. These emergent abilities have been extended with multi-modality to include image, audio, and video capabilities. Recommender systems, on the other hand, have been critical for information seeking and item discovery needs. Recently, there have been attempts to apply LLMs for recommendations. One difficulty of current attempts is that the underlying LLM is usually not trained on the recommender system data, which largely contains user interaction signals and is often not publicly available. Another difficulty is user interaction signals often have a different pattern from natural language text, and it is currently unclear if the LLM training setup can learn more non-trivial knowledge from interaction signals compared with traditional recommender system methods. Finally, it is difficult to train multiple LLMs for different use-cases, and to retain the original language and reasoning abilities when learning from recommender system data. To address these three limitations, we propose an Item-Language Model (ILM), which is composed of an item encoder to produce text-aligned item representations that encode user interaction signals, and a frozen LLM that can understand those item representations with preserved pretrained knowledge. We conduct extensive experiments which demonstrate both the importance of the language-alignment and of user interaction knowledge in the item encoder.

会話型レコメンデーションのためのアイテム言語モデル

Item-Language Model for Conversational Recommendation

要旨

Support