基于人类反馈的个性化智能体学习

摘要

当代人工智能体虽功能强大，却常难以契合用户独特且动态变化的个性化需求。现有方案通常依赖静态数据集，或通过交互历史训练隐式偏好模型，或将用户画像编码于外部记忆。然而这些方法对新用户及随时间演变的偏好适应性不足。我们提出基于人类反馈的个性化智能体（PAHF）框架，该框架通过显式的用户专属记忆库实现在线学习与持续个性化。PAHF构建了三步循环机制：（1）行动前主动澄清以消除歧义；（2）基于记忆库检索的偏好执行行动；（3）整合行动后反馈以更新记忆库，应对偏好漂移。为评估该能力，我们开发了四阶段测试协议及具身操作与在线购物双基准测试。这些基准可量化智能体从零学习初始偏好及后续适应角色转换的能力。理论分析与实证结果表明，显式记忆库与双反馈通道的融合至关重要：PAHF学习速度显著提升，持续超越无记忆与单通道基线方案，既降低了初始个性化误差，又能快速适应偏好转变。

English

Modern AI agents are powerful but often fail to align with the idiosyncratic, evolving preferences of individual users. Prior approaches typically rely on static datasets, either training implicit preference models on interaction history or encoding user profiles in external memory. However, these approaches struggle with new users and with preferences that change over time. We introduce Personalized Agents from Human Feedback (PAHF), a framework for continual personalization in which agents learn online from live interaction using explicit per-user memory. PAHF operationalizes a three-step loop: (1) seeking pre-action clarification to resolve ambiguity, (2) grounding actions in preferences retrieved from memory, and (3) integrating post-action feedback to update memory when preferences drift. To evaluate this capability, we develop a four-phase protocol and two benchmarks in embodied manipulation and online shopping. These benchmarks quantify an agent's ability to learn initial preferences from scratch and subsequently adapt to persona shifts. Our theoretical analysis and empirical results show that integrating explicit memory with dual feedback channels is critical: PAHF learns substantially faster and consistently outperforms both no-memory and single-channel baselines, reducing initial personalization error and enabling rapid adaptation to preference shifts.