LLMはユーザーの選好を理解するか？ユーザー評価予測におけるLLMの評価

要旨

大規模言語モデル（LLMs）は、ゼロショットまたは少数ショットの方法で新しいタスクに汎化する際に卓越した能力を示している。しかし、LLMsがユーザーの過去の行動に基づいてその嗜好をどの程度理解できるかは、未だに発展途上であり、明確でない研究課題である。伝統的に、協調フィルタリング（CF）はこれらのタスクにおいて最も効果的な手法であり、主に大量の評価データに依存している。一方、LLMsは通常、映画や製品などの各アイテムに関する網羅的な世界知識を保持しながら、はるかに少ないデータを要求する。本論文では、ユーザーの過去の評価に基づいて候補アイテムに対する評価を予測するという古典的なタスクにおいて、CFとLLMsの両方を徹底的に検証する。250Mから540Bのパラメータを持つさまざまなサイズのLLMsを調査し、ゼロショット、少数ショット、およびファインチューニングのシナリオでの性能を評価する。LLMsと強力なCF手法を比較するための包括的な分析を行い、ゼロショットのLLMsがユーザーインタラクションデータにアクセスできる従来の推薦モデルに遅れをとることを明らかにし、ユーザーインタラクションデータの重要性を示す。しかし、ファインチューニングを通じて、LLMsはわずかなトレーニングデータで同等またはそれ以上の性能を達成し、データ効率性を通じてその潜在能力を実証する。

English

Large Language Models (LLMs) have demonstrated exceptional capabilities in generalizing to new tasks in a zero-shot or few-shot manner. However, the extent to which LLMs can comprehend user preferences based on their previous behavior remains an emerging and still unclear research question. Traditionally, Collaborative Filtering (CF) has been the most effective method for these tasks, predominantly relying on the extensive volume of rating data. In contrast, LLMs typically demand considerably less data while maintaining an exhaustive world knowledge about each item, such as movies or products. In this paper, we conduct a thorough examination of both CF and LLMs within the classic task of user rating prediction, which involves predicting a user's rating for a candidate item based on their past ratings. We investigate various LLMs in different sizes, ranging from 250M to 540B parameters and evaluate their performance in zero-shot, few-shot, and fine-tuning scenarios. We conduct comprehensive analysis to compare between LLMs and strong CF methods, and find that zero-shot LLMs lag behind traditional recommender models that have the access to user interaction data, indicating the importance of user interaction data. However, through fine-tuning, LLMs achieve comparable or even better performance with only a small fraction of the training data, demonstrating their potential through data efficiency.

LLMはユーザーの選好を理解するか？ユーザー評価予測におけるLLMの評価

Do LLMs Understand User Preferences? Evaluating LLMs On User Rating Prediction

要旨

Support