GroupGPT: マルチユーザーチャットアシスタントのためのトークン効率とプライバシー保護を実現するエージェントフレームワーク

要旨

大規模言語モデル（LLM）の近年の進歩により、高度なチャットボットの実現が可能となってきた。しかし、既存システムの多くは単一ユーザー設定に焦点を当てたものであり、複雑に変化する文脈下でエージェントによるより能動的かつ正確な介入が求められる複数ユーザーグループチャットには十分に対応できていない。既存のアプローチは通常、推論と生成の両方にLLMに依存しており、トークン消費量の多さ、スケーラビリティの限界、潜在的なプライバシーリスクといった課題がある。これらの課題に対処するため、本論文では複数ユーザーチャットアシスタントのための、トークン効率が高くプライバシー保護を図るエージェントフレームワーク「GroupGPT」を提案する。GroupGPTは、小型モデルと大規模モデルの協調アーキテクチャを採用し、介入タイミングの決定と応答生成を分離することで、効率的かつ正確な意思決定を実現する。本フレームワークは、ミーム、画像、動画、音声メッセージを含むマルチモーダル入力もサポートする。さらに、複数ユーザーチャットアシスタントの介入推論評価のためのベンチマークデータセット「MUIR」を構築した。MUIRは、介入ラベルと理論的根拠が付与された2,500のグループチャットセグメントを含み、介入タイミングの正確性と応答品質の評価を支援する。我々は、大規模言語モデルから小型モデルまで、様々なモデルをMUIRで評価した。大規模な実験により、GroupGPTが正確かつ適切なタイミングで応答を生成し、LLMベースの評価で平均4.72/5.0のスコアを達成し、多様なグループチャットシナリオにおいてユーザーから高く評価されることを示した。さらに、GroupGPTはベースラインメソッドと比較してトークン使用量を最大3分の1に削減し、クラウド送信前のユーザーメッセージに対するプライバシーサニタイズを提供する。コードは以下で公開されている：https://github.com/Eliot-Shen/GroupGPT

English

Recent advances in large language models (LLMs) have enabled increasingly capable chatbots. However, most existing systems focus on single-user settings and do not generalize well to multi-user group chats, where agents require more proactive and accurate intervention under complex, evolving contexts. Existing approaches typically rely on LLMs for both reasoning and generation, leading to high token consumption, limited scalability, and potential privacy risks. To address these challenges, we propose GroupGPT, a token-efficient and privacy-preserving agentic framework for multi-user chat assistant. GroupGPT adopts a small-large model collaborative architecture to decouple intervention timing from response generation, enabling efficient and accurate decision-making. The framework also supports multimodal inputs, including memes, images, videos, and voice messages. We further introduce MUIR, a benchmark dataset for multi-user chat assistant intervention reasoning. MUIR contains 2,500 annotated group chat segments with intervention labels and rationales, supporting evaluation of timing accuracy and response quality. We evaluate a range of models on MUIR, from large language models to smaller counterparts. Extensive experiments demonstrate that GroupGPT produces accurate and well-timed responses, achieving an average score of 4.72/5.0 in LLM-based evaluation, and is well received by users across diverse group chat scenarios. Moreover, GroupGPT reduces token usage by up to 3 times compared to baseline methods, while providing privacy sanitization of user messages before cloud transmission. Code is available at: https://github.com/Eliot-Shen/GroupGPT .

GroupGPT: マルチユーザーチャットアシスタントのためのトークン効率とプライバシー保護を実現するエージェントフレームワーク

GroupGPT: A Token-efficient and Privacy-preserving Agentic Framework for Multi-User Chat Assistant

要旨

Support