ChatPaper.aiChatPaper

PageGuide:浏览器扩展插件,用于帮助用户浏览网页并定位信息

PageGuide: Browser extension to assist users in navigating a webpage and locating information

April 26, 2026
作者: Tin Nguyen, Thang T. Truong, Runtao Zhou, Trung Bui, Chirag Agarwal, Anh Totti Nguyen
cs.AI

摘要

日常浏览网页的用户面临三大痛点:在杂乱页面中快速定位信息、完成不熟悉的多步骤操作、在干扰内容中保持专注。现有顶尖AI助手(如ChatGPT、Gemini、Claude)和浏览器代理(如OpenAI Operator、Browser Use)虽能回答问题并执行自动化操作,但其返回结果时未展示页面信息来源,迫使用户手动验证结果并盲目信任每个自动化步骤。我们推出PageGuide浏览器扩展,通过视觉叠层将LLM回答直接锚定在HTML DOM中,满足三大核心需求:(a)查找模式——在页面原位定位并高亮相关证据,使用户即时验证答案;(b)引导模式——分步展示操作指南(如修改密码),让用户能跟随指引自主完成操作;(c)屏蔽模式——隐藏干扰内容,允许用户自主决定是否屏蔽元素。用户研究(N=94)表明,PageGuide在所有模式下均优于无辅助浏览:屏蔽准确率提升26个百分点(相对提升86.7%),任务完成时间缩短70%;引导模式完成率提高30个百分点;查找模式降低手动搜索成本,Ctrl+F使用量减少80%,任务时间缩短19%。代码与演示详见:pageguide.github.io。
English
Users browsing the web daily struggle to quickly locate relevant information in cluttered pages, complete unfamiliar multi-step tasks, and stay focused amid distracting content. State-of-the-art AI assistants (e.g., ChatGPT, Gemini, Claude) and browser agents (e.g., OpenAI Operator, Browser Use) can answer questions and automate actions, yet they return answers without showing where the information comes from on the page, forcing users to manually verify results and blindly trust every automated steps. We present PageGuide, a browser extension that grounds LLM answers directly in the HTML DOM via visual overlays, addressing three core user needs: (a) Find-locating and highlighting relevant evidence in-situ so users can instantly verify answers on the page; (b) Guide-showing step-by-step instructions (e.g. how to change password) one at a time so users can follow and perform actions by themselves; and (c) Hide-hiding distracting content-giving users a chance to decide to hide an element or not. In a user study (N=94), PageGuide outperform unaided browsing across all modes: Hide accuracy improve by 26 percentage points (86.7% relative gain) and task completion time drops by 70%; Guide completion rate increases by 30 percentage points; and Find reduces manual search effort, with Ctrl+F usage falling by 80% and task time decreasing by 19%. Code and demo is at: pageguide.github.io.