基于人类反馈的个性化智能体学习

摘要

现代人工智能代理虽功能强大，却往往难以契合用户独特且动态变化的个性化偏好。现有方法通常依赖静态数据集，或通过交互历史训练隐式偏好模型，或将用户画像编码于外部记忆。然而这些方法对新用户及随时间演变的偏好适应性不足。我们提出基于人类反馈的个性化代理（PAHF）框架，该框架通过显式用户记忆库从实时交互中持续学习，实现在线个性化。PAHF构建了三步循环机制：（1）行动前寻求澄清以消除歧义；（2）基于记忆库检索的偏好执行行动；（3）整合行动后反馈以更新偏好漂移时的记忆。为评估该能力，我们开发了四阶段评估流程及具身操作与在线购物双基准测试。这些基准可量化代理从零学习初始偏好及适应角色转变的能力。理论分析与实证结果表明，显式记忆与双反馈通道的融合至关重要：PAHF的学习速度显著提升，持续超越无记忆与单通道基线模型，既降低了初始个性化误差，又实现了对偏好漂移的快速适应。

English

Modern AI agents are powerful but often fail to align with the idiosyncratic, evolving preferences of individual users. Prior approaches typically rely on static datasets, either training implicit preference models on interaction history or encoding user profiles in external memory. However, these approaches struggle with new users and with preferences that change over time. We introduce Personalized Agents from Human Feedback (PAHF), a framework for continual personalization in which agents learn online from live interaction using explicit per-user memory. PAHF operationalizes a three-step loop: (1) seeking pre-action clarification to resolve ambiguity, (2) grounding actions in preferences retrieved from memory, and (3) integrating post-action feedback to update memory when preferences drift. To evaluate this capability, we develop a four-phase protocol and two benchmarks in embodied manipulation and online shopping. These benchmarks quantify an agent's ability to learn initial preferences from scratch and subsequently adapt to persona shifts. Our theoretical analysis and empirical results show that integrating explicit memory with dual feedback channels is critical: PAHF learns substantially faster and consistently outperforms both no-memory and single-channel baselines, reducing initial personalization error and enabling rapid adaptation to preference shifts.

基于人类反馈的个性化智能体学习

Learning Personalized Agents from Human Feedback

摘要

Support