更多上下文、更大模型，还是道德知识？政治文本中施瓦茨价值观检测的系统性研究

摘要

检测政治文本中的施瓦茨价值观存在难度，因为隐含线索通常依赖于周边论证以及相邻价值观之间的细微差异。本研究探讨了上下文和明确道德知识如何助力句子级别的价值观检测。采用ValuesML/Touché ValueEval格式，我们比较了句子级、窗口级和全文级输入；在无检索增强生成（RAG）和检索增强设置下，结合精心策划的道德知识库；使用监督式DeBERTa-v3-base/large编码器；以及参数规模从12B到123B的零样本大语言模型。结果表明，更多上下文并非总是更好：全文上下文使监督式DeBERTa编码器的宏F1分数比仅使用句子输入提升3.8至4.8个百分点，但对零样本大语言模型的帮助并不稳定。在匹配比较中，检索到的道德知识更为一致地发挥作用，在早期融合条件下提升了每个测试模型族和上下文场景的性能。然而，从DeBERTa-v3-base扩展到large版本，以及从12B扩展到更大规模的大语言模型，并不保证性能提升；对于编码器而言，简单的早期融合优于所测试的后期融合和交叉注意力RAG变体。逐价值观分析显示，上下文和检索对社交情境复杂或概念易混淆的价值观帮助最大。这些发现表明，价值观敏感的NLP应综合评估上下文、知识和模型族，而非将更长输入或更大模型视为通用改进手段。

English

Detecting Schwartz values in political text is difficult because implicit cues often depend on surrounding arguments and fine-grained distinctions between neighboring values. We study when context and explicit moral knowledge help sentence-level value detection. Using the ValuesML/Touch{é} ValueEval format, we compare sentence, window, and full-document inputs; no-RAG and retrieval-augmented settings with a curated moral knowledge base; supervised DeBERTa-v3-base/large encoders; and zero-shot LLMs from 12B to 123B parameters. The results show that more context is not uniformly better: full-document context improves supervised DeBERTa encoders by 3.8--4.8 macro-F1 points over sentence-only input, but does not consistently help zero-shot LLMs. Retrieved moral knowledge is more consistently useful in matched comparisons, improving each tested model family and context condition under early fusion. However, scaling from DeBERTa-v3-base to large and from 12B to larger LLMs does not guarantee gains, and simple early fusion outperforms the tested late-fusion and cross-attention RAG variants for encoders. Per-value analyses show that context and retrieval help most for socially situated or conceptually confusable values. These findings suggest that value-sensitive NLP should evaluate context, knowledge, and model family jointly rather than treating longer inputs or larger models as universal improvements.