より多くの文脈、より大きなモデル、それとも道徳的知識？：政治テキストにおけるシュワルツ価値検出の体系的研究

要旨

政治的テキストにおけるシュワルツ価値観の検出は困難である。なぜなら、暗黙的な手がかりは多くの場合、周囲の議論や近接する価値観間の微妙な差異に依存するからである。本研究では、文脈と明示的な道徳知識が文レベルでの価値観検出にいつ役立つかを調査する。ValuesML/Touché ValueEval形式を用いて、文、ウィンドウ、全文書の入力を比較する。厳選された道徳知識ベースを用いたno-RAG設定と検索拡張設定、教師ありDeBERTa-v3-base/largeエンコーダ、そして12Bから123BパラメータのゼロショットLLMを対象とする。結果は、より多くの文脈が一律に良いとは限らないことを示す。全文書の文脈は、文のみの入力と比較して教師ありDeBERTaエンコーダのマクロF1スコアを3.8～4.8ポイント向上させるが、ゼロショットLLMでは一貫した改善は見られない。検索された道徳知識は、一致比較においてより一貫して有用であり、初期融合条件下でテストした全モデルファミリーと文脈条件で改善が見られた。しかし、DeBERTa-v3-baseからlargeへのスケーリング、および12Bからより大きなLLMへのスケーリングは、必ずしも利得を保証せず、エンコーダにおいては単純な初期融合が、テストした後期融合やクロスアテンションRAG変種よりも優れている。価値観ごとの分析は、文脈と検索が、社会的に位置づけられた価値観や概念的に混同されやすい価値観において最も効果的であることを示す。これらの知見は、価値観に配慮したNLPは、長い入力をより大きなモデルを普遍的な改善として扱うのではなく、文脈、知識、モデルファミリーを総合的に評価すべきであることを示唆している。

English

Detecting Schwartz values in political text is difficult because implicit cues often depend on surrounding arguments and fine-grained distinctions between neighboring values. We study when context and explicit moral knowledge help sentence-level value detection. Using the ValuesML/Touch{é} ValueEval format, we compare sentence, window, and full-document inputs; no-RAG and retrieval-augmented settings with a curated moral knowledge base; supervised DeBERTa-v3-base/large encoders; and zero-shot LLMs from 12B to 123B parameters. The results show that more context is not uniformly better: full-document context improves supervised DeBERTa encoders by 3.8--4.8 macro-F1 points over sentence-only input, but does not consistently help zero-shot LLMs. Retrieved moral knowledge is more consistently useful in matched comparisons, improving each tested model family and context condition under early fusion. However, scaling from DeBERTa-v3-base to large and from 12B to larger LLMs does not guarantee gains, and simple early fusion outperforms the tested late-fusion and cross-attention RAG variants for encoders. Per-value analyses show that context and retrieval help most for socially situated or conceptually confusable values. These findings suggest that value-sensitive NLP should evaluate context, knowledge, and model family jointly rather than treating longer inputs or larger models as universal improvements.