ArcANE：角色扮演語言智能體能否在正確時機維持角色設定？

摘要

角色扮演語言代理（RPLAs）應扮演其價值觀與行為隨故事發展而演變的角色，而非維持固定不變的人格設定。現有評測基準僅衡量特定章節中的事實回憶能力，並未檢驗回應是否符合角色的心理發展軌跡，尤其在原文未曾探索的情境中。我們提出ArcANE（弧線感知敘事評估），這是一個自動建構的評測基準，涵蓋17部小說與80位主要角色。角色弧線將敘事沿心理軸線劃分為不同階段，每個探測問題在各階段提出相同情境，包含原文中出現的情境與原文未觸及的情境。在六種模型與六種上下文模式中，基於角色弧線的條件設定在所有模型上的表現皆優於其他上下文策略，且在原文未觸及的情境中——即檢索無法獲取資訊時——差距最為顯著。我們進一步在相同數據上微調開放權重模型，得到ArcANE-8B/32B，其在原文未觸及情境中進一步擴大了角色弧線的優勢。

English

Role-playing language agents (RPLAs) should play characters whose values and behavior evolve as the story progresses, not maintain a fixed persona. Existing benchmarks measure factual recall at a given chapter, not whether responses align with the character's psychological trajectory, especially in scenarios the source text never explores. We introduce ArcANE (Arc-Aware Narrative Evaluation), an automatically constructed benchmark spanning 17 novels and 80 principal characters. A Character Arc segments the narrative into phases along a psychological axis, and each probe poses the same scenario across phases, spanning both situations within the source text and situations beyond it. Across six models and six context modes, conditioning on the Character Arc tops every other context strategy on every model, and the gap is largest on scenarios outside the source text where retrieval has nothing to find. We further fine-tune open-weight models on the same data to obtain ArcANE-8B/32B, which widen the Arc advantage even more on scenarios outside the source text.