迈向基于大语言模型的社交媒体用户模拟:条件化评论预测的操作效度评估
Towards Simulating Social Media Users with LLMs: Evaluating the Operational Validity of Conditioned Comment Prediction
February 26, 2026
作者: Nils Schwager, Simon Münker, Alistair Plum, Achim Rettinger
cs.AI
摘要
大型语言模型(LLMs)从探索性工具向社会科学中主动"硅基主体"的转型,尚缺乏操作有效性的广泛验证。本研究提出条件化评论预测(CCP)任务,通过对比模型生成内容与真实数字痕迹,评估模型对特定刺激下用户评论行为的预测能力。该框架为当前LLMs模拟社交媒体用户行为的能力提供了严谨的评估方案。我们在英语、德语和卢森堡语场景下对开源8B参数模型(Llama3.1、Qwen3、Ministral)进行测试,通过系统比较提示策略(显式与隐式)及监督微调(SFT)的影响,发现低资源环境中存在形式与内容的解耦现象:SFT虽能对齐文本输出的表层结构(长度与句法),却削弱了语义根基。此外,研究证明在微调条件下,显式条件设置(生成用户画像)会变得冗余,因为模型能直接从行为历史中进行潜在推理。这些发现对当前"朴素提示"范式提出挑战,并为高保真模拟提供了优先采用真实行为痕迹而非描述性人格的操作指南。
English
The transition of Large Language Models (LLMs) from exploratory tools to active "silicon subjects" in social science lacks extensive validation of operational validity. This study introduces Conditioned Comment Prediction (CCP), a task in which a model predicts how a user would comment on a given stimulus by comparing generated outputs with authentic digital traces. This framework enables a rigorous evaluation of current LLM capabilities with respect to the simulation of social media user behavior. We evaluated open-weight 8B models (Llama3.1, Qwen3, Ministral) in English, German, and Luxembourgish language scenarios. By systematically comparing prompting strategies (explicit vs. implicit) and the impact of Supervised Fine-Tuning (SFT), we identify a critical form vs. content decoupling in low-resource settings: while SFT aligns the surface structure of the text output (length and syntax), it degrades semantic grounding. Furthermore, we demonstrate that explicit conditioning (generated biographies) becomes redundant under fine-tuning, as models successfully perform latent inference directly from behavioral histories. Our findings challenge current "naive prompting" paradigms and offer operational guidelines prioritizing authentic behavioral traces over descriptive personas for high-fidelity simulation.