AI服務化:透過智慧眼鏡實現主動式輔助
AI for Service: Proactive Assistance with AI Glasses
October 16, 2025
作者: Zichen Wen, Yiyu Wang, Chenfei Liao, Boxue Yang, Junxian Li, Weifeng Liu, Haocong He, Bolong Feng, Xuyang Liu, Yuanhuiyi Lyu, Xu Zheng, Xuming Hu, Linfeng Zhang
cs.AI
摘要
在人工智能从被动工具演变为主动且适应性强的伙伴的时代,我们引入了“服务型人工智能”(AI4Service),这一新范式旨在日常生活中提供主动且实时的协助。现有的AI服务大多仍处于被动状态,仅对用户的明确指令作出响应。我们主张,真正智能且有益的助手应具备预见用户需求并在适当时机主动采取行动的能力。为实现这一愿景,我们提出了Alpha-Service,一个统一框架,旨在解决两大核心挑战:通过从第一人称视角视频流中检测服务时机来“知晓何时介入”,以及“知晓如何”提供通用与个性化服务。受冯·诺依曼计算机架构启发,并基于AI眼镜,Alpha-Service由五个关键组件构成:感知输入单元、任务调度中央处理单元、工具利用算术逻辑单元、长期个性化记忆单元及自然人际交互输出单元。作为初步探索,我们通过部署于AI眼镜上的多智能体系统实现了Alpha-Service。案例研究,包括实时二十一点顾问、博物馆导览员及购物搭配助手,展示了其无缝感知环境、推断用户意图并在无需明确提示下提供及时有效协助的能力。
English
In an era where AI is evolving from a passive tool into an active and
adaptive companion, we introduce AI for Service (AI4Service), a new paradigm
that enables proactive and real-time assistance in daily life. Existing AI
services remain largely reactive, responding only to explicit user commands. We
argue that a truly intelligent and helpful assistant should be capable of
anticipating user needs and taking actions proactively when appropriate. To
realize this vision, we propose Alpha-Service, a unified framework that
addresses two fundamental challenges: Know When to intervene by detecting
service opportunities from egocentric video streams, and Know How to provide
both generalized and personalized services. Inspired by the von Neumann
computer architecture and based on AI glasses, Alpha-Service consists of five
key components: an Input Unit for perception, a Central Processing Unit for
task scheduling, an Arithmetic Logic Unit for tool utilization, a Memory Unit
for long-term personalization, and an Output Unit for natural human
interaction. As an initial exploration, we implement Alpha-Service through a
multi-agent system deployed on AI glasses. Case studies, including a real-time
Blackjack advisor, a museum tour guide, and a shopping fit assistant,
demonstrate its ability to seamlessly perceive the environment, infer user
intent, and provide timely and useful assistance without explicit prompts.