以其行為識之：透過UI痕跡對LLM瀏覽器代理進行指紋辨識

摘要

隨著以大型語言模型為基礎的智慧代理人日益代表使用者瀏覽網頁，一個自然衍生的問題隨之浮現：網站能否被動地識別出驅動該代理人的底層模型？若能如此，將構成重大的安全隱憂，使得針對已知模型漏洞的定向攻擊成為可能。我們在涵蓋資訊檢索與購物任務的14個前沿大型語言模型及四個網路環境中，證明了透過被動JavaScript追蹤器所捕捉的代理人動作與互動時序，足以識別其底層模型，最高可達96%的F1分數。我們透過展示基於代理人動作訓練的分類器可泛化至不同模型規模與系列，正式定義了此攻擊面。我們進一步證明，僅需少數互動軌跡即可訓練出高效分類器，且代理人身份能在單一任務片段內早期推斷。在動作之間注入隨機化的時間延遲會顯著降低分類器效能，但無法提供穩固的防護：針對延遲軌跡重新訓練的分類器大致能恢復原有效能。我們開源了測試框架與標註過的代理人軌跡語料庫，網址為 https://github.com/KabakaWilliam/known_actions{此處}。

English

As LLM-based agents increasingly browse the web on users' behalf, a natural question arises: can websites passively identify which underlying model powers an agent? Doing so would represent a significant security risk, enabling targeted attacks tailored to known model vulnerabilities. Across 14 frontier LLMs and four web environments spanning information retrieval and shopping tasks, we show that an agent's actions and interaction timings, captured via a passive JavaScript tracker, are sufficient to identify the underlying model with up to 96\% F1. We formalise this attack surface by demonstrating that classifiers trained on agent actions generalise across model sizes and families. We further show that strong classifiers can be trained from few interaction traces and that agent identity can be inferred early within an episode. Injecting randomised timing delays between actions substantially degrades classifier performance, but does not provide robust protection: a classifier retrained on delayed traces largely recovers performance. We release our harness and a labelled corpus of agent traces https://github.com/KabakaWilliam/known_actions{here}.