差分プライベートな大規模言語モデルを用いた合成クエリ生成によるプライバシー保護型推薦システム

要旨

我々は、差分プライバシー（DP）を適用した大規模言語モデル（LLM）を用いて、プライバシー保護型の大規模レコメンダーシステムを開発するための新たなアプローチを提案します。この方法は、DPトレーニングにおける特定の課題や制限を克服するものであり、特にLLMベースのレコメンダーシステムという新興分野に適していますが、自然言語入力を処理するあらゆるレコメンダーシステムにも容易に適用可能です。我々のアプローチでは、公開済みの事前学習済みLLMをクエリ生成タスクに対してDPトレーニング手法を用いてファインチューニングします。これにより得られたモデルは、元のクエリを代表するプライベートな合成クエリを生成でき、これらのクエリは追加のプライバシーコストを発生させることなく、下流の非プライベートなレコメンデーショントレーニング手順に自由に利用できます。我々は、この手法が効果的な深層検索モデルを安全にトレーニングする能力を評価し、検索モデルを直接DPトレーニングする方法と比較して、クエリレベルのプライバシー保証を損なうことなく、検索品質が大幅に向上することを確認しました。

English

We propose a novel approach for developing privacy-preserving large-scale recommender systems using differentially private (DP) large language models (LLMs) which overcomes certain challenges and limitations in DP training these complex systems. Our method is particularly well suited for the emerging area of LLM-based recommender systems, but can be readily employed for any recommender systems that process representations of natural language inputs. Our approach involves using DP training methods to fine-tune a publicly pre-trained LLM on a query generation task. The resulting model can generate private synthetic queries representative of the original queries which can be freely shared for any downstream non-private recommendation training procedures without incurring any additional privacy cost. We evaluate our method on its ability to securely train effective deep retrieval models, and we observe significant improvements in their retrieval quality without compromising query-level privacy guarantees compared to methods where the retrieval models are directly DP trained.

差分プライベートな大規模言語モデルを用いた合成クエリ生成によるプライバシー保護型推薦システム

Privacy-Preserving Recommender Systems with Synthetic Query Generation using Differentially Private Large Language Models

要旨

Support