ArcANE: 角色扮演语言代理能否在恰当时候保持角色？

摘要

角色扮演语言代理（RPLAs）应演绎随剧情推进而价值观与行为不断演变的角色，而非维持固定人设。现有基准仅测评单章内的事实回忆能力，并未检验回应是否符合角色的心理演变轨迹，尤其当场景超出源文本探索范围时。我们提出ArcANE（弧光感知叙事评估），这是一个自动构建的基准，涵盖17部小说与80位主要角色。角色弧光将叙事按心理轴线切分为多个阶段，每个探针在跨阶段场景中提出相同情境，这些场景既包含源文本内的情境，也包含超越源文本的情境。在六种模型与六种上下文模式中，所有模型在源文本外场景上的最大差距均出现在以角色弧光为条件时——此时检索已无据可查。我们进一步对开源权重模型进行相同数据微调，获得ArcANE-8B/32B模型，其在源文本外场景上进一步扩大了弧光策略的优势。

English

Role-playing language agents (RPLAs) should play characters whose values and behavior evolve as the story progresses, not maintain a fixed persona. Existing benchmarks measure factual recall at a given chapter, not whether responses align with the character's psychological trajectory, especially in scenarios the source text never explores. We introduce ArcANE (Arc-Aware Narrative Evaluation), an automatically constructed benchmark spanning 17 novels and 80 principal characters. A Character Arc segments the narrative into phases along a psychological axis, and each probe poses the same scenario across phases, spanning both situations within the source text and situations beyond it. Across six models and six context modes, conditioning on the Character Arc tops every other context strategy on every model, and the gap is largest on scenarios outside the source text where retrieval has nothing to find. We further fine-tune open-weight models on the same data to obtain ArcANE-8B/32B, which widen the Arc advantage even more on scenarios outside the source text.