コンテキスト・価値・アクションアーキテクチャ：価値駆動型大規模言語モデルエージェントのための枠組み

要旨

大規模言語モデル（LLMs）は人間の行動シミュレーションにおいて有望な成果を示しているが、既存のエージェントはしばしば行動的硬直性を示し、この欠点は現在の「LLMによる評価」の自己参照的バイアスによって覆い隠されがちである。実証的グラウンドトゥルースに基づいて評価を行うことで、我々は直感に反する現象を明らかにする：プロンプト駆動型推論の強度を増しても模倣忠実度は向上せず、むしろ価値観の分極化を悪化させ、集団多様性を崩壊させる。この問題に対処するため、我々は刺激-有機体-反応（S-O-R）モデルとシュワルツの基礎的価値理論に基づくContext-Value-Action（CVA）アーキテクチャを提案する。自己検証に依存する手法とは異なり、CVAは行動生成と認知的推論を分離し、動的価値活性化を明示的にモデル化するために実人間データで学習した新規のValue Verifierを導入する。110万件以上の実世界インタラクショントレースから構成されるCVABenchを用いた実験により、CVAがベースラインを大幅に上回ることを実証する。本手法は、優れた行動忠実度と解釈可能性を提供しつつ、分極化を効果的に緩和する。

English

Large Language Models (LLMs) have shown promise in simulating human behavior, yet existing agents often exhibit behavioral rigidity, a flaw frequently masked by the self-referential bias of current "LLM-as-a-judge" evaluations. By evaluating against empirical ground truth, we reveal a counter-intuitive phenomenon: increasing the intensity of prompt-driven reasoning does not enhance fidelity but rather exacerbates value polarization, collapsing population diversity. To address this, we propose the Context-Value-Action (CVA) architecture, grounded in the Stimulus-Organism-Response (S-O-R) model and Schwartz's Theory of Basic Human Values. Unlike methods relying on self-verification, CVA decouples action generation from cognitive reasoning via a novel Value Verifier trained on authentic human data to explicitly model dynamic value activation. Experiments on CVABench, which comprises over 1.1 million real-world interaction traces, demonstrate that CVA significantly outperforms baselines. Our approach effectively mitigates polarization while offering superior behavioral fidelity and interpretability.

コンテキスト・価値・アクションアーキテクチャ：価値駆動型大規模言語モデルエージェントのための枠組み

Context-Value-Action Architecture for Value-Driven Large Language Model Agents

要旨

Support