기계의 실용적 사고: 대규모 언어 모델에서 실용적 능력의 출현 추적

초록

현재의 대규모 언어 모델(LLM)은 함축 해결(Sravanthi 등, 2024)과 마음이론 추론(Shapira 등, 2024)을 포함한 사회적 지능 과제에서 새로운 능력을 보여주고 있으며, 이 두 가지 모두 상당한 화용적 이해를 필요로 합니다. 그러나 LLM이 훈련 과정에서 이러한 능력을 어떻게 습득하는지는 여전히 잘 이해되지 않고 있습니다. 본 연구에서는 대안이라는 화용적 개념에 기반을 둔 ALTPRAG 데이터셋을 소개하여, 다양한 훈련 단계에 있는 LLM이 미묘한 화자의 의도를 정확히 추론할 수 있는지 평가합니다. 각 사례는 맥락상 적절하지만 화용적으로 구별되는 두 가지 후속 문장을 짝지어, 화용적 해석과 대조적 추론에 대한 세밀한 평가를 가능하게 합니다. 우리는 주요 훈련 단계인 사전 훈련, 지도 미세 조정(SFT), 그리고 선호 최적화에 걸쳐 22개의 LLM을 체계적으로 평가하여 화용적 능력의 발달을 조사했습니다. 연구 결과에 따르면, 기본 모델조차도 화용적 단서에 대한 뚜렷한 민감성을 보이며, 이는 모델과 데이터 규모의 증가에 따라 지속적으로 개선됩니다. 또한, SFT와 RLHF는 특히 인지-화용적 추론에서 추가적인 향상을 가져옵니다. 이러한 발견들은 화용적 능력이 LLM 훈련의 창발적이고 구성적인 속성임을 강조하며, 모델을 인간의 의사소통 규범에 맞추는 데 새로운 통찰을 제공합니다.

English

Current large language models (LLMs) have demonstrated emerging capabilities in social intelligence tasks, including implicature resolution (Sravanthi et al. (2024)) and theory-of-mind reasoning (Shapira et al. (2024)), both of which require substantial pragmatic understanding. However, how LLMs acquire this competence throughout the training process remains poorly understood. In this work, we introduce ALTPRAG, a dataset grounded in the pragmatic concept of alternatives, designed to evaluate whether LLMs at different training stages can accurately infer nuanced speaker intentions. Each instance pairs two contextually appropriate but pragmatically distinct continuations, enabling fine-grained assessment of both pragmatic interpretation and contrastive reasoning. We systematically evaluate 22 LLMs across key training stages: pre-training, supervised fine-tuning (SFT), and preference optimization, to examine the development of pragmatic competence. Our results show that even base models exhibit notable sensitivity to pragmatic cues, which improves consistently with increases in model and data scale. Additionally, SFT and RLHF contribute further gains, particularly in cognitive-pragmatic reasoning. These findings highlight pragmatic competence as an emergent and compositional property of LLM training and offer new insights for aligning models with human communicative norms.

기계의 실용적 사고: 대규모 언어 모델에서 실용적 능력의 출현 추적

The Pragmatic Mind of Machines: Tracing the Emergence of Pragmatic Competence in Large Language Models

초록

Support