胡言學：以深度解讀無意義挑戰大型語言模型

摘要

我們引入“胡言學”（Drivelology），這一獨特的語言現象被描述為“蘊含深意的無稽之談”，即語句在句法上連貫，但在語用上卻呈現出矛盾、情感豐富或修辭顛覆的特點。此類表達雖看似表層無意義，卻隱含著需通過語境推斷、道德推理或情感解讀才能領會的深層含義。我們發現，儘管當前的大型語言模型（LLMs）在眾多自然語言處理（NLP）任務中表現出色，卻始終難以把握胡言學文本的多層語義。為探究此問題，我們構建了一個小而多樣的基準數據集，包含超過1200個精心挑選的示例，涵蓋英語、普通話、西班牙語、法語、日語及韓語。數據標註尤具挑戰性：每個示例均需經過專家細緻審核，以確保其真正體現胡言學特徵。這一過程涉及多輪討論與裁決，以解決分歧，凸顯了胡言學的微妙與主觀性。我們對一系列LLMs進行了分類、生成及推理任務的評估。結果顯示，LLMs存在明顯局限：模型常將胡言學與淺層無意義混淆，給出不合邏輯的解釋，或完全忽略隱含的修辭功能。這些發現揭示了LLMs在語用理解上的深層表徵缺陷，並挑戰了統計流利性等同於認知理解的假設。我們公開數據集與代碼，以促進在超越表層連貫性的語言深度建模方面的進一步研究。

English

We introduce Drivelology, a unique linguistic phenomenon characterised as "nonsense with depth", utterances that are syntactically coherent yet pragmatically paradoxical, emotionally loaded, or rhetorically subversive. While such expressions may resemble surface-level nonsense, they encode implicit meaning requiring contextual inference, moral reasoning, or emotional interpretation. We find that current large language models (LLMs), despite excelling at many natural language processing (NLP) tasks, consistently fail to grasp the layered semantics of Drivelological text. To investigate this, we construct a small but diverse benchmark dataset of over 1,200 meticulously curated examples, with select instances in English, Mandarin, Spanish, French, Japanese, and Korean. Annotation was especially challenging: each of the examples required careful expert review to verify that it truly reflected Drivelological characteristics. The process involved multiple rounds of discussion and adjudication to address disagreements, highlighting the subtle and subjective nature of the Drivelology. We evaluate a range of LLMs on classification, generation, and reasoning tasks. Our results reveal clear limitations of LLMs: models often confuse Drivelology with shallow nonsense, produce incoherent justifications, or miss the implied rhetorical function altogether. These findings highlight a deeper representational gap in LLMs' pragmatic understanding and challenge the assumption that statistical fluency implies cognitive comprehension. We release our dataset and code to facilitate further research in modelling linguistic depth beyond surface-level coherence.

胡言學：以深度解讀無意義挑戰大型語言模型

Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth

摘要

Support