ArcANE: ロールプレイング言語エージェントは適切なタイミングでキャラクターを維持しているか？

要旨

ロールプレイング言語エージェント（RPLA）は、物語の進行に伴って価値観や行動が変化するキャラクターを演じるべきであり、固定されたペルソナを維持するべきではない。既存のベンチマークは、特定の章における事実の想起を測定するものであり、応答がキャラクターの心理的な軌跡と一致しているかどうか、特に原作テキストで探求されていないシナリオについては評価していない。本稿では、17の小説と80の主要キャラクターをカバーする自動構築ベンチマーク、ArcANE（Arc-Aware Narrative Evaluation）を提案する。キャラクターアークは、物語を心理的軸に沿ったフェーズに分割し、各プローブは同じシナリオをフェーズをまたいで提示する。このシナリオは、原作テキスト内の状況と、それを超えた状況の両方を含む。6つのモデルと6つのコンテキストモードにわたる評価において、キャラクターアークを条件とすることが、すべてのモデルで他のすべてのコンテキスト戦略を上回り、その差は、検索が何も見つからない原作テキスト外のシナリオで最も大きくなる。さらに、同一データ上でオープンウェイトモデルをファインチューニングし、ArcANE-8B/32Bを獲得した。これにより、原作テキスト外のシナリオにおいて、アークの優位性はさらに拡大する。

English

Role-playing language agents (RPLAs) should play characters whose values and behavior evolve as the story progresses, not maintain a fixed persona. Existing benchmarks measure factual recall at a given chapter, not whether responses align with the character's psychological trajectory, especially in scenarios the source text never explores. We introduce ArcANE (Arc-Aware Narrative Evaluation), an automatically constructed benchmark spanning 17 novels and 80 principal characters. A Character Arc segments the narrative into phases along a psychological axis, and each probe poses the same scenario across phases, spanning both situations within the source text and situations beyond it. Across six models and six context modes, conditioning on the Character Arc tops every other context strategy on every model, and the gap is largest on scenarios outside the source text where retrieval has nothing to find. We further fine-tune open-weight models on the same data to obtain ArcANE-8B/32B, which widen the Arc advantage even more on scenarios outside the source text.