跨领域异构任务中大型语言模型的自我演进记忆提取

摘要

随着基于大语言模型的助手逐渐实现持久化和个性化，其必须从历史对话中提取并保留有用信息作为记忆。然而不同任务间值得记忆的信息类型存在显著差异。我们正式定义了异构记忆提取任务，并推出BEHEMOTH基准——该基准通过重构涵盖个性化、问题解决和智能体任务的18个现有数据集，采用下游效用驱动指标进行系统性评估。实证分析表明：不存在适用于所有任务类别的单一静态提取提示模板；而现有专为同质分布设计的自进化提示优化框架，在训练任务呈现异构性时性能会出现退化。为此我们提出CluE框架，这种基于聚类的自进化策略通过提取场景对训练样本进行聚类分组，独立分析各聚类特征，并融合跨聚类洞察来更新提取提示。在BEHEMOTH上的实验表明，CluE能有效泛化至异构任务（相对增益+9.04%），持续优于现有自进化框架。

English

As LLM-based assistants become persistent and personalized, they must extract and retain useful information from past conversations as memory. However, the types of information worth remembering vary considerably across tasks. We formalize the heterogeneous memory extraction task and introduce BEHEMOTH, a benchmark that repurposes 18 existing datasets spanning personalization, problem-solving, and agentic tasks, using a downstream utility-driven metric for systematic evaluation. Our empirical analysis confirms that no single static extraction prompt dominates across all task categories, and that existing self-evolving prompt optimization frameworks, originally designed for homogeneous distributions, degrade when training tasks are heterogeneous. To address this, we propose CluE, a cluster-based self-evolving strategy that groups training examples into clusters by extraction scenarios, analyzes each cluster independently, and synthesizes cross-cluster insights to update the extraction prompt. Experiments on BEHEMOTH show that CluE generalizes effectively across heterogeneous tasks (+9.04\% relative gain), consistently outperforming prior self-evolving frameworks.

跨领域异构任务中大型语言模型的自我演进记忆提取

Self-Evolving LLM Memory Extraction Across Heterogeneous Tasks

摘要

Support