LoSoNA：群體對話中局部社會規範適應的基準

摘要

線上群組聊天是具有本地對話規範的社交空間，這些規範通常未被明確說明。基於大型語言模型的代理能否辨識並適應這些規範，以及其意願如何，仍有待深入探討。我們提出LoSoNA基準，用於評估多人聊天中的本地社交規範適應能力。每個場景均提供一個經過整理的群組聊天記錄給主體模型，其中非主體參與者展現出一項隱藏的本地規範，隨後透過一個最終誘發輪次強制模型回應，藉此判斷主體是否推斷出該規範。我們在四種提示條件下評估了八個前沿及開放權重模型，這些條件差異在於模型被指示將先前對話視為其應答依據的明確程度。多數模型在單純提示下表現有限；明確的規範感知提示則帶來不均勻的提升，其中Gemini 3.1 Pro達到84.2%，Claude Fable 5達到81.6%，而其他幾個模型僅有微小進步甚至出現倒退。LoSoNA透過測試模型能否從先例推斷本地對話規範並在單輪群組聊天回應中加以運用，為近期呼籲評估大型語言模型社交能力的研究作出了貢獻。

English

Online group chats are social spaces with local conversational norms that are rarely stated explicitly. The ability and willingness of LLM-based agents to recognize and adapt to these norms remains mostly unexplored. We introduce LoSoNA, a benchmark for local social norm adaptation in multi-party chat. Each scenario gives a subject model a curated group-chat transcript in which non-subject participants demonstrate a hidden local norm, followed by a final elicitor turn that forces a response revealing whether the subject has inferred that norm. We evaluate eight frontier and open-weight models under four prompting conditions that vary how explicitly the model is told to treat the prior conversation as evidence for how it should answer. Naive prompting remains limited for most models; explicit norm-aware prompting helps unevenly, with Gemini 3.1 Pro reaching 84.2% and Claude Fable 5 reaching 81.6%, while several other models show small gains or regressions. LoSoNA contributes to recent calls for evaluating LLM social capabilities by testing whether models can infer local conversational norms from precedent and use them in a one-turn group-chat response.