利用大型語言模型進行人類苦難的預測分析

摘要

本研究探討了使用大型語言模型（LLMs）從現實世界場景的自然語言描述中預測人類感知的痛苦分數。該任務被框架化為一個回歸問題，模型為每個輸入語句分配一個0到100的標量值。我們評估了多種提示策略，包括零樣本、固定上下文少樣本以及使用BERT句子嵌入的基於檢索的提示。少樣本方法始終優於零樣本基線，凸顯了情境示例在情感預測中的價值。為了超越靜態評估，我們引入了“痛苦遊戲秀”，這是一個受電視節目啟發的新穎遊戲化框架。它通過涉及序數比較、二元分類、標量估計和反饋驅動推理的結構化回合來測試LLMs。這種設置使我們不僅能夠評估預測準確性，還能評估模型基於糾正反饋的適應能力。遊戲化評估突顯了LLMs在動態情感推理任務中超越標準回歸的廣泛潛力。代碼和數據鏈接：https://github.com/abhi1nandy2/Misery_Data_Exps_GitHub

English

This study investigates the use of Large Language Models (LLMs) for predicting human-perceived misery scores from natural language descriptions of real-world scenarios. The task is framed as a regression problem, where the model assigns a scalar value from 0 to 100 to each input statement. We evaluate multiple prompting strategies, including zero-shot, fixed-context few-shot, and retrieval-based prompting using BERT sentence embeddings. Few-shot approaches consistently outperform zero-shot baselines, underscoring the value of contextual examples in affective prediction. To move beyond static evaluation, we introduce the "Misery Game Show", a novel gamified framework inspired by a television format. It tests LLMs through structured rounds involving ordinal comparison, binary classification, scalar estimation, and feedback-driven reasoning. This setup enables us to assess not only predictive accuracy but also the model's ability to adapt based on corrective feedback. The gamified evaluation highlights the broader potential of LLMs in dynamic emotional reasoning tasks beyond standard regression. Code and data link: https://github.com/abhi1nandy2/Misery_Data_Exps_GitHub

利用大型語言模型進行人類苦難的預測分析

Leveraging Large Language Models for Predictive Analysis of Human Misery

摘要

Support