AI赋能服务:智能眼镜的主动式辅助
AI for Service: Proactive Assistance with AI Glasses
October 16, 2025
作者: Zichen Wen, Yiyu Wang, Chenfei Liao, Boxue Yang, Junxian Li, Weifeng Liu, Haocong He, Bolong Feng, Xuyang Liu, Yuanhuiyi Lyu, Xu Zheng, Xuming Hu, Linfeng Zhang
cs.AI
摘要
在人工智能从被动工具向主动适应型伙伴演进的时代,我们提出了“服务导向型人工智能”(AI4Service)这一新范式,旨在日常生活中提供主动且实时的协助。现有的AI服务大多仍停留在被动响应阶段,仅对用户的明确指令作出反应。我们认为,真正智能且贴心的助手应具备预见用户需求并在适当时机主动采取行动的能力。为实现这一愿景,我们提出了Alpha-Service框架,该框架致力于解决两大核心挑战:通过从第一人称视角视频流中检测服务时机来“知晓何时介入”,以及提供通用与个性化服务来“知晓如何行动”。受冯·诺依曼计算机架构启发,并基于智能眼镜技术,Alpha-Service由五大关键组件构成:感知输入单元、任务调度中央处理单元、工具利用算术逻辑单元、长期个性化记忆单元及自然人际交互输出单元。作为初步探索,我们通过部署于智能眼镜上的多智能体系统实现了Alpha-Service。案例研究,如实时二十一点顾问、博物馆导览助手及购物搭配助手,展示了其无缝感知环境、推断用户意图并在无需明确提示下提供及时有效协助的能力。
English
In an era where AI is evolving from a passive tool into an active and
adaptive companion, we introduce AI for Service (AI4Service), a new paradigm
that enables proactive and real-time assistance in daily life. Existing AI
services remain largely reactive, responding only to explicit user commands. We
argue that a truly intelligent and helpful assistant should be capable of
anticipating user needs and taking actions proactively when appropriate. To
realize this vision, we propose Alpha-Service, a unified framework that
addresses two fundamental challenges: Know When to intervene by detecting
service opportunities from egocentric video streams, and Know How to provide
both generalized and personalized services. Inspired by the von Neumann
computer architecture and based on AI glasses, Alpha-Service consists of five
key components: an Input Unit for perception, a Central Processing Unit for
task scheduling, an Arithmetic Logic Unit for tool utilization, a Memory Unit
for long-term personalization, and an Output Unit for natural human
interaction. As an initial exploration, we implement Alpha-Service through a
multi-agent system deployed on AI glasses. Case studies, including a real-time
Blackjack advisor, a museum tour guide, and a shopping fit assistant,
demonstrate its ability to seamlessly perceive the environment, infer user
intent, and provide timely and useful assistance without explicit prompts.