ArcANE: 롤플레잉 언어 에이전트는 적절한 시기에 캐릭터를 유지하는가?

초록

역할극 언어 에이전트(RPLA)는 고정된 페르소나를 유지하는 것이 아니라, 이야기가 진행됨에 따라 가치관과 행동이 진화하는 캐릭터를 연기해야 한다. 기존 벤치마크는 특정 장에서의 사실 회상 능력을 측정할 뿐, 캐릭터의 심리적 궤적, 특히 원본 텍스트가 탐구하지 않은 시나리오에서 응답이 이와 일치하는지 여부는 평가하지 않는다. 우리는 17편의 소설과 80명의 주요 캐릭터를 포함하는 자동 구축 벤치마크인 ArcANE(Arc-Aware Narrative Evaluation)을 소개한다. 캐릭터 아크(Character Arc)는 서사를 심리적 축을 따라 단계로 분할하며, 각 탐침(probe)은 원본 텍스트 내의 상황과 그 너머의 상황을 모두 포괄하여, 여러 단계에 걸쳐 동일한 시나리오를 제시한다. 여섯 가지 모델과 여섯 가지 맥락 모드에 걸쳐, 캐릭터 아크를 조건으로 주는 방식이 모든 모델에서 다른 모든 맥락 전략보다 우수하며, 그 차이는 검색이 정보를 찾을 수 없는 원본 텍스트 외부 시나리오에서 가장 크게 나타난다. 또한 우리는 동일한 데이터로 오픈 가중치 모델을 미세 조정하여 ArcANE-8B/32B를 얻었으며, 이 모델들은 원본 텍스트 외부 시나리오에서 아크의 이점을 더욱 확장한다.

English

Role-playing language agents (RPLAs) should play characters whose values and behavior evolve as the story progresses, not maintain a fixed persona. Existing benchmarks measure factual recall at a given chapter, not whether responses align with the character's psychological trajectory, especially in scenarios the source text never explores. We introduce ArcANE (Arc-Aware Narrative Evaluation), an automatically constructed benchmark spanning 17 novels and 80 principal characters. A Character Arc segments the narrative into phases along a psychological axis, and each probe poses the same scenario across phases, spanning both situations within the source text and situations beyond it. Across six models and six context modes, conditioning on the Character Arc tops every other context strategy on every model, and the gap is largest on scenarios outside the source text where retrieval has nothing to find. We further fine-tune open-weight models on the same data to obtain ArcANE-8B/32B, which widen the Arc advantage even more on scenarios outside the source text.