SPASM: 다중 턴 대화 생성을 위한 안정적 페르소나 기반 에이전트 시뮬레이션

초록

대규모 언어 모델은 교육, 지원, 상담 등 다중 턴(multi-turn) 환경에서 점점 더 많이 배포되고 있으며, 이러한 환경에서의 신뢰성은 장기적인 대화 흐름에서 일관된 역할, 페르소나, 목표를 유지하는 것에 달려 있습니다. 이러한 요구사항은 LLM이 훈련 및 평가를 위한 합성 대화를 생성하는 데 사용될 때 특히 중요해지는데, 이는 LLM 간 대화가 페르소나 표류(persona drift), 역할 혼동, 그리고 한 에이전트가 점차 상대방을 닮아가는 "에코잉(echoing)"과 같은 정체성 관련 오류를 누적할 수 있기 때문입니다. 본 논문에서는 SPASM(Stable Persona-driven Agent Simulation for Multi-turn dialogue generation)을 소개합니다. 이는 모듈식이며 안정성을 최우선으로 하는 프레임워크로, 시뮬레이션을 (i) 스키마 샘플링, 타당성 검증, 자연어 페르소나 구축을 통한 페르소나 생성, (ii) 클라이언트-응답자(Client–Responder) 대화 생성, (iii) 일관된 종료를 위한 종료 감지의 세 단계로 분해합니다. 모델 가중치를 변경하지 않고 장기적 안정성을 향상시키기 위해 우리는 자기중심적 맥락 투영(Egocentric Context Projection, ECP)을 제안합니다: 대화 기록은 관점 중립적 표현으로 저장되고, 생성 전 각 에이전트의 자기중심적 관점으로 결정론적으로 투영됩니다. 세 가지 LLM 백본(GPT-4o-mini, DeepSeek-V3.2, Qwen-Plus)과 아홀 가지 클라이언트-응답자 조합을 통해, 우리는 4,500개의 페르소나와 45,000개의 대화(조합당 500개 페르소나 X 10개 대화)로 구성된 데이터셋을 구축했습니다. Ablation 실험은 ECP가 페르소나 표류를 상당히 줄이고, 인간 검증 하에서 에코잉을 제거함을 보여줍니다. 임베딩 분석은 페르소나 구조를 재확인하고 응답자 주도의 강한 상호작용 기하구조를 드러냅니다. 우리의 코드는 https://github.com/lhannnn/SPASM에서 이용 가능합니다.

English

Large language models are increasingly deployed in multi-turn settings such as tutoring, support, and counseling, where reliability depends on preserving consistent roles, personas, and goals across long horizons. This requirement becomes critical when LLMs are used to generate synthetic dialogues for training and evaluation, since LLM--LLM conversations can accumulate identity-related failures such as persona drift, role confusion, and "echoing", where one agent gradually mirrors its partner. We introduce SPASM (Stable Persona-driven Agent Simulation for Multi-turn dialogue generation), a modular, stability-first framework that decomposes simulation into (i) persona creation via schema sampling, plausibility validation, and natural-language persona crafting, (ii) Client--Responder dialogue generation, and (iii) termination detection for coherent stopping. To improve long-horizon stability without changing model weights, we propose Egocentric Context Projection (ECP): dialogue history is stored in a perspective-agnostic representation and deterministically projected into each agent's egocentric view before generation. Across three LLM backbones (GPT-4o-mini, DeepSeek-V3.2, Qwen-Plus) and nine Client--Responder pairings, we construct a dataset of 4,500 personas and 45,000 conversations (500 personas X 10 conversations per pairing). Ablations show ECP substantially reduces persona drift and, under human validation, eliminates echoing; embedding analyses recover persona structure and reveal strong responder-driven interaction geometry. Our code is available at https://github.com/lhannnn/SPASM.

SPASM: 다중 턴 대화 생성을 위한 안정적 페르소나 기반 에이전트 시뮬레이션

SPASM: Stable Persona-driven Agent Simulation for Multi-turn Dialogue Generation

초록

Support