더 많은 맥락, 더 큰 모델, 또는 도덕적 지식? 정치 텍스트에서 슈워츠 가치 탐지에 대한 체계적 연구

초록

정치 텍스트에서 슈워츠 가치를 탐지하는 것은 암시적 단서가 종종 주변 논증과 인접 가치 간의 미세한 구분에 의존하기 때문에 어렵다. 본 연구에서는 맥락과 명시적 도덕 지식이 문장 수준 가치 탐지에 언제 도움이 되는지 분석한다. ValuesML/Touché ValueEval 형식을 사용하여 문장, 윈도우, 전체 문서 입력을 비교하고, 선별된 도덕 지식 베이스를 활용한 비검색 증강(no-RAG) 및 검색 증강 설정, 지도 학습 DeBERTa-v3-base/large 인코더, 그리고 12B에서 123B 파라미터 규모의 제로샷 LLM을 비교했다. 결과는 더 많은 맥락이 항상 유리하지는 않음을 보여준다. 전체 문서 맥락은 문장만 입력했을 때보다 지도 학습 DeBERTa 인코더에서 매크로 F1 점수가 3.8~4.8포인트 향상되었지만, 제로샷 LLM에서는 일관된 개선을 보이지 않았다. 검색된 도덕 지식은 일치된 비교 조건에서 더욱 일관되게 유용했으며, 초기 융합(early fusion) 방식으로 적용했을 때 테스트된 각 모델군과 맥락 조건에서 성능을 향상시켰다. 그러나 DeBERTa-v3-base에서 large로, 12B에서 더 큰 LLM으로 확장하는 것이 항상 성능 향상을 보장하지는 않았으며, 인코더에 대해서는 단순한 초기 융합이 테스트된 후기 융합(late-fusion) 및 교차 주의(cross-attention) RAG 변형보다 뛰어난 성능을 보였다. 가치별 분석에 따르면, 맥락과 검색은 사회적으로 위치하거나 개념적으로 혼동되기 쉬운 가치에 가장 큰 도움을 주는 것으로 나타났다. 이러한 결과는 가치에 민감한 자연어 처리가 더 긴 입력이나 더 큰 모델을 보편적인 개선책으로 보기보다는 맥락, 지식, 모델군을 함께 평가해야 함을 시사한다.

English

Detecting Schwartz values in political text is difficult because implicit cues often depend on surrounding arguments and fine-grained distinctions between neighboring values. We study when context and explicit moral knowledge help sentence-level value detection. Using the ValuesML/Touch{é} ValueEval format, we compare sentence, window, and full-document inputs; no-RAG and retrieval-augmented settings with a curated moral knowledge base; supervised DeBERTa-v3-base/large encoders; and zero-shot LLMs from 12B to 123B parameters. The results show that more context is not uniformly better: full-document context improves supervised DeBERTa encoders by 3.8--4.8 macro-F1 points over sentence-only input, but does not consistently help zero-shot LLMs. Retrieved moral knowledge is more consistently useful in matched comparisons, improving each tested model family and context condition under early fusion. However, scaling from DeBERTa-v3-base to large and from 12B to larger LLMs does not guarantee gains, and simple early fusion outperforms the tested late-fusion and cross-attention RAG variants for encoders. Per-value analyses show that context and retrieval help most for socially situated or conceptually confusable values. These findings suggest that value-sensitive NLP should evaluate context, knowledge, and model family jointly rather than treating longer inputs or larger models as universal improvements.