行動で知られる：UIトレースによるLLMブラウザエージェントのフィンガープリンティング

要旨

LLMベースのエージェントがユーザーに代わってウェブを閲覧する機会が増えるにつれ、自然な疑問が生じる：ウェブサイトは受動的にどの基盤モデルがエージェントを駆動しているかを識別できるだろうか？もし可能であれば、既知のモデル脆弱性に合わせた標的型攻撃を可能にし、重大なセキュリティリスクを意味する。我々は、14の最先端LLMと、情報検索およびショッピングタスクにわたる4つのウェブ環境において、受動的なJavaScriptトラッカーを介して捕捉されたエージェントの行動と相互作用のタイミングが、最大96%のF1値で基盤モデルを識別するのに十分であることを示す。エージェントの行動に基づいて訓練された分類器がモデルサイズやファミリーを超えて汎化することを実証し、この攻撃面を形式化する。さらに、少数の相互作用トレースから強力な分類器を訓練できること、エピソード内の早期にエージェントの同一性を推論できることを示す。行動間にランダムなタイミング遅延を注入すると分類器の性能は大幅に低下するが、堅牢な防御とはならない：遅延トレースで再訓練された分類器は性能をほぼ回復する。我々は、我々のハーネスとラベル付きエージェントトレースのコーパスをhttps://github.com/KabakaWilliam/known_actions{here}で公開する。

English

As LLM-based agents increasingly browse the web on users' behalf, a natural question arises: can websites passively identify which underlying model powers an agent? Doing so would represent a significant security risk, enabling targeted attacks tailored to known model vulnerabilities. Across 14 frontier LLMs and four web environments spanning information retrieval and shopping tasks, we show that an agent's actions and interaction timings, captured via a passive JavaScript tracker, are sufficient to identify the underlying model with up to 96\% F1. We formalise this attack surface by demonstrating that classifiers trained on agent actions generalise across model sizes and families. We further show that strong classifiers can be trained from few interaction traces and that agent identity can be inferred early within an episode. Injecting randomised timing delays between actions substantially degrades classifier performance, but does not provide robust protection: a classifier retrained on delayed traces largely recovers performance. We release our harness and a labelled corpus of agent traces https://github.com/KabakaWilliam/known_actions{here}.