优秀查询的构成要素：人类易混淆语言特征对LLM性能影响的量化分析

摘要

大型语言模型的幻觉生成现象通常被归因于模型或其解码策略的缺陷。借鉴经典语言学理论，我们认为查询语句的形式同样会影响听者（及模型）的响应。我们通过构建22维查询特征向量来具象化这一观点，该向量涵盖从句复杂性、词汇稀有度、指代关系、否定形式、可答性及意图锚定等已知影响人类理解能力的维度。基于369,837条真实世界查询数据，我们探究：是否存在特定类型的查询更易诱发幻觉？大规模分析揭示出具有一致性的"风险图谱"：深度从句嵌套和指代模糊等特征与较高幻觉倾向正相关；而明确的意图锚定和可答性则与较低幻觉率相关。其他如领域特异性等特征则呈现混合效应，其影响因数据集和模型而异。这些发现首次建立了与幻觉风险存在实证关联的查询特征表征体系，为定向查询重构及未来干预研究奠定基础。

English

Large Language Model (LLM) hallucinations are usually treated as defects of the model or its decoding strategy. Drawing on classical linguistics, we argue that a query's form can also shape a listener's (and model's) response. We operationalize this insight by constructing a 22-dimension query feature vector covering clause complexity, lexical rarity, and anaphora, negation, answerability, and intention grounding, all known to affect human comprehension. Using 369,837 real-world queries, we ask: Are there certain types of queries that make hallucination more likely? A large-scale analysis reveals a consistent "risk landscape": certain features such as deep clause nesting and underspecification align with higher hallucination propensity. In contrast, clear intention grounding and answerability align with lower hallucination rates. Others, including domain specificity, show mixed, dataset- and model-dependent effects. Thus, these findings establish an empirically observable query-feature representation correlated with hallucination risk, paving the way for guided query rewriting and future intervention studies.

优秀查询的构成要素：人类易混淆语言特征对LLM性能影响的量化分析

What Makes a Good Query? Measuring the Impact of Human-Confusing Linguistic Features on LLM Performance

摘要

Support