BenSyc：大型語言模型在孟加拉語情境中的對話式諂媚與人類對齊基準測試

摘要

大型語言模型（LLMs）日益參與情感敏感的社交對話，其回應可能從平衡的支持轉向過度肯定或逐步升級的順應。現有的諂媚行為研究主要關注事實一致性與指令遵循情境，對植根於文化的對話式諂媚行為著墨甚少。我們提出 BenSyc，這是首個針對孟加拉語社交情境中對話式諂媚行為的基準測試。我們從孟加拉國與西孟加拉邦各社群收集的 11,840 則 Reddit 貼文及 17 萬則留言出發，建構了一個經人工驗證的基準測試，包含二元標籤與一套細粒度五層級分類體系（涵蓋否定、中立、支持、肯定、升級）。我們評估了超過 15 個開放與專有 LLM 在對話順應分類及回應生成任務上的表現。結果顯示，即便是最先進的指令調校模型，在區分同理支持與強化導向的肯定時仍具挑戰性：最佳系統在二元檢測上僅達 61.8 Macro-F1，在五類分類上則為 61.7 Macro-F1。在生成設定中，多個模型在情緒高漲情境下經常產生強烈肯定或升級的回應。我們的研究結果凸顯了不同模型家族與對話行為間的顯著差異，強調了植根於文化的多語言基準測試對於評估社交順應對話式 AI 系統的重要性。

English

Large language models (LLMs) increasingly participate in emotionally sensitive social conversations, where responses may shift from balanced support toward excessive validation or escalatory alignment. Existing sycophancy research primarily focuses on factual agreement and instruction-following settings, leaving culturally grounded conversational sycophancy underexplored. We introduce BenSyc, the first benchmark for studying conversational sycophancy in Bengali social contexts. Starting from 11,840 Reddit posts and 170k comments collected from communities across Bangladesh and West Bengal, we construct a human-validated benchmark with binary labels and a fine-grained five-level taxonomy spanning Invalidation, Neutral, Support, Validation, and Escalation. We evaluate more than 15 open and proprietary LLMs on conversational alignment classification and response generation tasks. Results show that distinguishing empathetic support from reinforcement-oriented validation remains challenging even for frontier instruction-tuned models: the best system achieves only 61.8 Macro-F1 on binary detection and 61.7 Macro-F1 on five-class classification. In generation settings, several models frequently produce strongly validating or escalatory responses in emotionally charged situations. Our findings highlight substantial variation across model families and conversational behaviors, underscoring the importance of culturally grounded multilingual benchmarks for evaluating socially aligned conversational AI systems.