SPASM: マルチターン対話生成のための安定したペルソナ駆動型エージェントシミュレーション

要旨

大規模言語モデルは、教育、サポート、カウンセリングなどのマルチターン設定で展開される機会が増えており、長期的な一貫性を保つには、役割、人物像、目標の持続性が信頼性の鍵となる。この要件は、LLMが訓練・評価用の合成対話を生成する際に特に重要である。LLM同士の対話では、人物像の浮動（persona drift）、役割の混同、一方のエージェントが相手を次第に模倣する「エコー現象」といった、アイデンティティ関連の失敗が蓄積する傾向があるからだ。本研究では、SPASM（Stable Persona-driven Agent Simulation for Multi-turn dialogue generation）を提案する。これはモジュール型で安定性を重視したフレームワークであり、シミュレーションを以下の3段階に分解する：（i）スキーマサンプリング、妥当性検証、自然言語による人物像構築を通じた人物像の作成、（ii）クライアントとレスポンダーの対話生成、（iii）首尾一貫した終了のための終了検出。モデルの重みを変更せずに長期的な安定性を向上させるため、自己中心的な文脈投影（Egocentric Context Projection: ECP）を提案する。対話履歴は視点に依存しない表現で保存され、生成前に各エージェントの自己中心的視点に決定的に投影される。3つのLLM基盤モデル（GPT-4o-mini、DeepSeek-V3.2、Qwen-Plus）と9組のクライアント・レスポンダーペアを用いて、4,500の人物像と45,000の対話（各ペアあたり500人物像×10対話）からなるデータセットを構築した。アブレーション研究により、ECPが人物像の浮動を大幅に抑制し、人間による検証の下ではエコー現象を排除することが示された。埋め込み分析では人物像の構造が再現され、レスポンダー主導の強い相互作用の幾何学構造が明らかになった。コードはhttps://github.com/lhannnn/SPASM で公開している。

English

Large language models are increasingly deployed in multi-turn settings such as tutoring, support, and counseling, where reliability depends on preserving consistent roles, personas, and goals across long horizons. This requirement becomes critical when LLMs are used to generate synthetic dialogues for training and evaluation, since LLM--LLM conversations can accumulate identity-related failures such as persona drift, role confusion, and "echoing", where one agent gradually mirrors its partner. We introduce SPASM (Stable Persona-driven Agent Simulation for Multi-turn dialogue generation), a modular, stability-first framework that decomposes simulation into (i) persona creation via schema sampling, plausibility validation, and natural-language persona crafting, (ii) Client--Responder dialogue generation, and (iii) termination detection for coherent stopping. To improve long-horizon stability without changing model weights, we propose Egocentric Context Projection (ECP): dialogue history is stored in a perspective-agnostic representation and deterministically projected into each agent's egocentric view before generation. Across three LLM backbones (GPT-4o-mini, DeepSeek-V3.2, Qwen-Plus) and nine Client--Responder pairings, we construct a dataset of 4,500 personas and 45,000 conversations (500 personas X 10 conversations per pairing). Ablations show ECP substantially reduces persona drift and, under human validation, eliminates echoing; embedding analyses recover persona structure and reveal strong responder-driven interaction geometry. Our code is available at https://github.com/lhannnn/SPASM.

SPASM: マルチターン対話生成のための安定したペルソナ駆動型エージェントシミュレーション

SPASM: Stable Persona-driven Agent Simulation for Multi-turn Dialogue Generation

要旨

Support