語言模型模擬語言

摘要

基於索緒爾和喬姆斯基理論框架對大型語言模型（LLMs）的語言學評論，往往流於臆測且缺乏建設性。批評者質疑LLMs能否真正模擬語言，強調需具備“深層結構”或“語義基礎”以實現理想的語言“能力”。我們主張徹底轉變視角，採納著名普通語言學家與歷史語言學家維托爾德·馬恩恰克的經驗主義原則。他將語言定義為“所有言說與書寫的總和”，而非“符號系統”或“大腦的計算系統”。尤為重要的是，他將特定語言元素的使用頻率視為語言的首要支配原則。運用其理論框架，我們反駁了先前對LLMs的批評，並為設計、評估及解讀語言模型提供了建設性的指導。

English

Linguistic commentary on LLMs, heavily influenced by the theoretical frameworks of de Saussure and Chomsky, is often speculative and unproductive. Critics challenge whether LLMs can legitimately model language, citing the need for "deep structure" or "grounding" to achieve an idealized linguistic "competence." We argue for a radical shift in perspective towards the empiricist principles of Witold Ma\'nczak, a prominent general and historical linguist. He defines language not as a "system of signs" or a "computational system of the brain" but as the totality of all that is said and written. Above all, he identifies frequency of use of particular language elements as language's primary governing principle. Using his framework, we challenge prior critiques of LLMs and provide a constructive guide for designing, evaluating, and interpreting language models.