异构任务中的自演进大语言模型记忆提取

摘要

随着基于大语言模型的助手趋向持久化与个性化发展，其必须从历史对话中提取并保留有效信息作为记忆。然而不同任务间值得记忆的信息类型存在显著差异。我们正式提出异构记忆提取任务，并构建BEHEMOTH基准——该基准重构了涵盖个性化、问题解决与智能体任务的18个现有数据集，采用下游效用驱动指标进行系统性评估。实证分析表明：不存在适用于所有任务类别的单一静态提取提示模板，且专为同质分布设计的现有自进化提示优化框架在训练任务异构时性能会出现退化。为此，我们提出基于聚类的自进化策略CluE：通过提取场景对训练样本进行聚类分组，独立分析各簇特征并融合跨簇洞察以更新提取提示。在BEHEMOTH上的实验表明，CluE在异构任务中具有卓越的泛化能力（相对增益+9.04%），持续优于现有自进化框架。

English

As LLM-based assistants become persistent and personalized, they must extract and retain useful information from past conversations as memory. However, the types of information worth remembering vary considerably across tasks. We formalize the heterogeneous memory extraction task and introduce BEHEMOTH, a benchmark that repurposes 18 existing datasets spanning personalization, problem-solving, and agentic tasks, using a downstream utility-driven metric for systematic evaluation. Our empirical analysis confirms that no single static extraction prompt dominates across all task categories, and that existing self-evolving prompt optimization frameworks, originally designed for homogeneous distributions, degrade when training tasks are heterogeneous. To address this, we propose CluE, a cluster-based self-evolving strategy that groups training examples into clusters by extraction scenarios, analyzes each cluster independently, and synthesizes cross-cluster insights to update the extraction prompt. Experiments on BEHEMOTH show that CluE generalizes effectively across heterogeneous tasks (+9.04\% relative gain), consistently outperforming prior self-evolving frameworks.

异构任务中的自演进大语言模型记忆提取

Self-Evolving LLM Memory Extraction Across Heterogeneous Tasks

摘要

Support