언어 모델은 언어를 모델링한다

초록

소쉬르와 촘스키의 이론적 틀에 크게 영향을 받은 대형 언어 모델(LLM)에 대한 언어학적 논평은 종종 추측적이며 생산성이 떨어진다. 비평가들은 LLM이 언어를 정당하게 모델링할 수 있는지에 대해 의문을 제기하며, 이상화된 언어적 "능력"을 달성하기 위해서는 "심층 구조"나 "근거화"가 필요하다고 주장한다. 우리는 저명한 일반 및 역사 언어학자인 비톨트 만차크의 경험주의 원칙으로의 급진적인 관점 전환을 주장한다. 그는 언어를 "기호 체계"나 "뇌의 계산 체계"가 아니라 말해지고 쓰여진 모든 것의 총체로 정의한다. 무엇보다도 그는 특정 언어 요소의 사용 빈도를 언어의 주요 지배 원칙으로 확인한다. 그의 틀을 사용하여 우리는 LLM에 대한 기존 비판에 도전하고, 언어 모델을 설계, 평가, 해석하기 위한 건설적인 가이드를 제공한다.

English

Linguistic commentary on LLMs, heavily influenced by the theoretical frameworks of de Saussure and Chomsky, is often speculative and unproductive. Critics challenge whether LLMs can legitimately model language, citing the need for "deep structure" or "grounding" to achieve an idealized linguistic "competence." We argue for a radical shift in perspective towards the empiricist principles of Witold Ma\'nczak, a prominent general and historical linguist. He defines language not as a "system of signs" or a "computational system of the brain" but as the totality of all that is said and written. Above all, he identifies frequency of use of particular language elements as language's primary governing principle. Using his framework, we challenge prior critiques of LLMs and provide a constructive guide for designing, evaluating, and interpreting language models.

언어 모델은 언어를 모델링한다

Language Models Model Language

초록

Support