SAVOIR: 샤플리 기반 보상 귀속을 통한 사회적 처세술 학습

초록

사회적 지능, 즉 복잡한 대인관계 상호작용을 원활하게 이끌어가는 능력은 언어 에이전트에게 근본적인 과제로 작용합니다. 강화 학습을 통해 이러한 에이전트를 훈련시키기 위해서는 크레딧 할당 문제, 즉 다중 턴 대화 결과에 개별 발화가 어떻게 기여하는지를 결정하는 문제를 해결해야 합니다. 기존 접근법은 에피소드 수준의 보상을 분배하기 위해 언어 모델을 직접 사용하여, 회고적이고 이론적 근거가 부족한 귀속 결과를 도출해냈습니다. 우리는 협력 게임 이론에 기반한 새로운 원칙적 프레임워크인 SAVOIR(ShApley Value fOr SocIal RL)를 제안합니다. 우리의 접근법은 두 가지 상호 보완적인 원리를 결합합니다: 기대 효용은 회고적 귀속에서 예측적 가치 평가로 전환하여, 발화가 유리한 미래 경로를 가능하게 하는 전략적 잠재력을 포착합니다; 샤플리 값은 효율성, 대칭성, 한계성에 대한 공리적 보장을 통해 공정한 크레딧 분배를 보장합니다. SOTOPIA 벤치마크에서의 실험 결과, SAVOIR가 모든 평가 설정에서 새로운 최첨단 성능을 달성하며, 우리의 70억 파라미터 모델이 GPT-4o 및 Claude-3.5-Sonnet을 포함한 독점 모델들을 능가하거나 그에 버금가는 성과를 보였습니다. 주목할 점은, 대규모 추론 모델들도 일관적으로 낮은 성능을 보여 사회적 지능이 분석적 추론과는 질적으로 다른 능력을 필요로 함을 시사합니다.

English

Social intelligence, the ability to navigate complex interpersonal interactions, presents a fundamental challenge for language agents. Training such agents via reinforcement learning requires solving the credit assignment problem: determining how individual utterances contribute to multi-turn dialogue outcomes. Existing approaches directly employ language models to distribute episode-level rewards, yielding attributions that are retrospective and lack theoretical grounding. We propose SAVOIR (ShApley Value fOr SocIal RL), a novel principled framework grounded in cooperative game theory. Our approach combines two complementary principles: expected utility shifts evaluation from retrospective attribution to prospective valuation, capturing an utterance's strategic potential for enabling favorable future trajectories; Shapley values ensure fair credit distribution with axiomatic guarantees of efficiency, symmetry, and marginality. Experiments on the SOTOPIA benchmark demonstrate that SAVOIR achieves new state-of-the-art performance across all evaluation settings, with our 7B model matching or exceeding proprietary models including GPT-4o and Claude-3.5-Sonnet. Notably, even large reasoning models consistently underperform, suggesting social intelligence requires qualitatively different capabilities than analytical reasoning.

SAVOIR: 샤플리 기반 보상 귀속을 통한 사회적 처세술 학습

SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution

초록

Support