Attention au Troisième Œil ! Évaluation de la Sensibilisation à la Vie Privée dans les Agents Intelligents sur Smartphone Pilotés par MLLM

papers.abstract

Les smartphones offrent une commodité significative aux utilisateurs, mais permettent également aux appareils d'enregistrer de manière extensive divers types d'informations personnelles. Les agents pour smartphones existants, alimentés par des modèles de langage multimodaux de grande envergure (MLLMs), ont obtenu des performances remarquables dans l'automatisation de différentes tâches. Cependant, en contrepartie, ces agents bénéficient d'un accès substantiel aux informations personnelles sensibles des utilisateurs pendant leur fonctionnement. Pour obtenir une compréhension approfondie de la conscience de la confidentialité de ces agents, nous présentons le premier benchmark à grande échelle couvrant 7 138 scénarios, à notre connaissance. De plus, pour le contexte de confidentialité dans les scénarios, nous annotons son type (par exemple, les identifiants de compte), son niveau de sensibilité et son emplacement. Nous évaluons ensuite soigneusement sept agents pour smartphones grand public disponibles. Nos résultats montrent que presque tous les agents évalués présentent une conscience de la confidentialité insatisfaisante (RA), avec des performances restant inférieures à 60 % même avec des indices explicites. Globalement, les agents propriétaires montrent une meilleure capacité de confidentialité que les agents open-source, et Gemini 2.0-flash obtient les meilleurs résultats, atteignant un RA de 67 %. Nous constatons également que la capacité de détection de la confidentialité des agents est fortement liée au niveau de sensibilité du scénario, c'est-à-dire que les scénarios avec un niveau de sensibilité plus élevé sont généralement plus identifiables. Nous espérons que ces résultats éclaireront la communauté de recherche pour repenser le compromis déséquilibré entre utilité et confidentialité concernant les agents pour smartphones. Notre code et notre benchmark sont disponibles à l'adresse https://zhixin-l.github.io/SAPA-Bench.

English

Smartphones bring significant convenience to users but also enable devices to extensively record various types of personal information. Existing smartphone agents powered by Multimodal Large Language Models (MLLMs) have achieved remarkable performance in automating different tasks. However, as the cost, these agents are granted substantial access to sensitive users' personal information during this operation. To gain a thorough understanding of the privacy awareness of these agents, we present the first large-scale benchmark encompassing 7,138 scenarios to the best of our knowledge. In addition, for privacy context in scenarios, we annotate its type (e.g., Account Credentials), sensitivity level, and location. We then carefully benchmark seven available mainstream smartphone agents. Our results demonstrate that almost all benchmarked agents show unsatisfying privacy awareness (RA), with performance remaining below 60% even with explicit hints. Overall, closed-source agents show better privacy ability than open-source ones, and Gemini 2.0-flash achieves the best, achieving an RA of 67%. We also find that the agents' privacy detection capability is highly related to scenario sensitivity level, i.e., the scenario with a higher sensitivity level is typically more identifiable. We hope the findings enlighten the research community to rethink the unbalanced utility-privacy tradeoff about smartphone agents. Our code and benchmark are available at https://zhixin-l.github.io/SAPA-Bench.

Attention au Troisième Œil ! Évaluation de la Sensibilisation à la Vie Privée dans les Agents Intelligents sur Smartphone Pilotés par MLLM

Mind the Third Eye! Benchmarking Privacy Awareness in MLLM-powered Smartphone Agents

papers.abstract

Support