利用大型语言模型预测人类苦难分析

摘要

本研究探讨了利用大型语言模型（LLMs）从现实场景的自然语言描述中预测人类感知的痛苦评分。该任务被构建为一个回归问题，模型为每个输入语句分配一个0到100之间的标量值。我们评估了多种提示策略，包括零样本、固定上下文少样本以及基于BERT句子嵌入的检索式提示。少样本方法持续优于零样本基线，凸显了情境示例在情感预测中的价值。为了超越静态评估，我们引入了“痛苦游戏秀”，这是一个受电视节目启发的新型游戏化框架。它通过包含序数比较、二元分类、标量估计和反馈驱动推理的结构化轮次来测试LLMs。这一设置使我们不仅能评估预测准确性，还能评估模型基于纠正反馈的适应能力。游戏化评估突显了LLMs在动态情感推理任务中超越标准回归的广泛潜力。代码和数据链接：https://github.com/abhi1nandy2/Misery_Data_Exps_GitHub

English

This study investigates the use of Large Language Models (LLMs) for predicting human-perceived misery scores from natural language descriptions of real-world scenarios. The task is framed as a regression problem, where the model assigns a scalar value from 0 to 100 to each input statement. We evaluate multiple prompting strategies, including zero-shot, fixed-context few-shot, and retrieval-based prompting using BERT sentence embeddings. Few-shot approaches consistently outperform zero-shot baselines, underscoring the value of contextual examples in affective prediction. To move beyond static evaluation, we introduce the "Misery Game Show", a novel gamified framework inspired by a television format. It tests LLMs through structured rounds involving ordinal comparison, binary classification, scalar estimation, and feedback-driven reasoning. This setup enables us to assess not only predictive accuracy but also the model's ability to adapt based on corrective feedback. The gamified evaluation highlights the broader potential of LLMs in dynamic emotional reasoning tasks beyond standard regression. Code and data link: https://github.com/abhi1nandy2/Misery_Data_Exps_GitHub

利用大型语言模型预测人类苦难分析

Leveraging Large Language Models for Predictive Analysis of Human Misery

摘要

Support