가치 주도 대규모 언어 모델 에이전트를 위한 맥락-가치-행동 아키텍처

초록

대규모 언어 모델(LLM)은 인간 행동 시뮬레이션에서 가능성을 보여주었으나, 기존 에이전트는 종종 행동 경직성을 보입니다. 이 결함은 현재 "LLM-as-a-judge" 평가 방식의 자기 참조적 편향에 의해 빈번히 가려집니다. 경험적 근거 자료를 기준으로 평가함으로써, 우리는 직관에 반하는 현상을 발견했습니다: 프롬프트 주도 추론의 강도를 높이는 것이 충실도를 향상시키지 못하고, 오히려 가치 양극화를 악화시켜 모집단 다양성을 붕괴시킵니다. 이를 해결하기 위해 우리는 자극-유기체-반응(S-O-R) 모델과 Schwartz의 기본 인간 가치 이론에 기반한 Context-Value-Action(CVA) 아키텍처를 제안합니다. 자기 검증에 의존하는 방법과 달리, CVA는 실제 인간 데이터로 훈련된 새로운 Value Verifier를 통해 인지적 추론과 행동 생성을 분리하여 동적 가치 활성화를 명시적으로 모델링합니다. 110만 개 이상의 실제 상호작용 흔적으로 구성된 CVABench에서의 실험 결과, CVA가 기준 모델들을 크게 능가함을 보여줍니다. 우리의 접근 방식은 우수한 행동 충실도와 해석 가능성을 제공하면서도 양극화를 효과적으로 완화합니다.

English

Large Language Models (LLMs) have shown promise in simulating human behavior, yet existing agents often exhibit behavioral rigidity, a flaw frequently masked by the self-referential bias of current "LLM-as-a-judge" evaluations. By evaluating against empirical ground truth, we reveal a counter-intuitive phenomenon: increasing the intensity of prompt-driven reasoning does not enhance fidelity but rather exacerbates value polarization, collapsing population diversity. To address this, we propose the Context-Value-Action (CVA) architecture, grounded in the Stimulus-Organism-Response (S-O-R) model and Schwartz's Theory of Basic Human Values. Unlike methods relying on self-verification, CVA decouples action generation from cognitive reasoning via a novel Value Verifier trained on authentic human data to explicitly model dynamic value activation. Experiments on CVABench, which comprises over 1.1 million real-world interaction traces, demonstrate that CVA significantly outperforms baselines. Our approach effectively mitigates polarization while offering superior behavioral fidelity and interpretability.

가치 주도 대규모 언어 모델 에이전트를 위한 맥락-가치-행동 아키텍처

Context-Value-Action Architecture for Value-Driven Large Language Model Agents

초록

Support