アクティブユーザーコマンドを用いたインタラクティブ推薦エージェント

要旨

従来のレコメンダーシステムは、ユーザーを「いいね」や「嫌い」といった単純な選択肢に限定する受動的なフィードバックメカニズムに依存しています。しかし、このような粗粒度の信号では、ユーザーの微妙な行動動機や意図を捉えることができません。その結果、現在のシステムは、ユーザーの満足度や不満を引き起こす特定のアイテム属性を識別することもできず、不正確な嗜好モデリングを招いています。これらの根本的な制約により、ユーザーの意図とシステムの解釈の間に持続的なギャップが生じ、最終的にはユーザー満足度を損ない、システムの有効性を低下させています。これらの課題を解決するため、我々は「Interactive Recommendation Feed (IRF)」を導入します。これは、主流のレコメンデーションフィード内で自然言語コマンドを可能にする先駆的なパラダイムです。従来のシステムがユーザーを受動的な暗黙的行動影響に閉じ込めるのに対し、IRFはリアルタイムの言語コマンドを通じて、レコメンデーションポリシーに対する能動的で明示的な制御を可能にします。このパラダイムをサポートするため、我々はRecBotを開発しました。これは、Parser Agentが言語表現を構造化された嗜好に変換し、Planner Agentが適応的なツールチェーンを動的に調整して即座にポリシーを調整するデュアルエージェントアーキテクチャです。実用的な展開を可能にするため、シミュレーションを活用した知識蒸留を採用し、強力な推論能力を維持しながら効率的なパフォーマンスを実現しています。大規模なオフラインおよび長期にわたるオンライン実験を通じて、RecBotはユーザー満足度とビジネス成果の両方で大幅な改善を示しています。

English

Traditional recommender systems rely on passive feedback mechanisms that limit users to simple choices such as like and dislike. However, these coarse-grained signals fail to capture users' nuanced behavior motivations and intentions. In turn, current systems cannot also distinguish which specific item attributes drive user satisfaction or dissatisfaction, resulting in inaccurate preference modeling. These fundamental limitations create a persistent gap between user intentions and system interpretations, ultimately undermining user satisfaction and harming system effectiveness. To address these limitations, we introduce the Interactive Recommendation Feed (IRF), a pioneering paradigm that enables natural language commands within mainstream recommendation feeds. Unlike traditional systems that confine users to passive implicit behavioral influence, IRF empowers active explicit control over recommendation policies through real-time linguistic commands. To support this paradigm, we develop RecBot, a dual-agent architecture where a Parser Agent transforms linguistic expressions into structured preferences and a Planner Agent dynamically orchestrates adaptive tool chains for on-the-fly policy adjustment. To enable practical deployment, we employ simulation-augmented knowledge distillation to achieve efficient performance while maintaining strong reasoning capabilities. Through extensive offline and long-term online experiments, RecBot shows significant improvements in both user satisfaction and business outcomes.

アクティブユーザーコマンドを用いたインタラクティブ推薦エージェント

Interactive Recommendation Agent with Active User Commands

要旨

Support