인간의 고통에 대한 예측 분석을 위한 대규모 언어 모델 활용

초록

본 연구는 실제 상황을 자연어로 기술한 설명문으로부터 인간이 지각하는 고통 점수를 예측하기 위해 대규모 언어 모델(LLM)의 활용을 탐구합니다. 이 과제는 회귀 문제로 설정되며, 모델은 각 입력 문장에 대해 0부터 100까지의 스칼라 값을 할당합니다. 우리는 제로샷, 고정된 문맥을 사용한 퓨샷, 그리고 BERT 문장 임베딩을 활용한 검색 기반 프롬프팅을 포함한 다양한 프롬프팅 전략을 평가합니다. 퓨샷 접근법은 일관적으로 제로샷 기준선을 능가하며, 감정 예측에서 문맥적 예시의 중요성을 강조합니다. 정적 평가를 넘어서기 위해, 우리는 텔레비전 형식에서 영감을 받은 새로운 게임화 프레임워크인 "고통 게임 쇼"를 도입합니다. 이는 순위 비교, 이진 분류, 스칼라 추정, 그리고 피드백 기반 추론을 포함한 구조화된 라운드를 통해 LLM을 테스트합니다. 이 설정은 예측 정확도뿐만 아니라 수정 피드백에 기반하여 모델이 적응하는 능력도 평가할 수 있게 합니다. 게임화된 평가는 표준 회귀를 넘어 동적 감정 추론 과제에서 LLM의 더 넓은 잠재력을 강조합니다. 코드 및 데이터 링크: https://github.com/abhi1nandy2/Misery_Data_Exps_GitHub

English

This study investigates the use of Large Language Models (LLMs) for predicting human-perceived misery scores from natural language descriptions of real-world scenarios. The task is framed as a regression problem, where the model assigns a scalar value from 0 to 100 to each input statement. We evaluate multiple prompting strategies, including zero-shot, fixed-context few-shot, and retrieval-based prompting using BERT sentence embeddings. Few-shot approaches consistently outperform zero-shot baselines, underscoring the value of contextual examples in affective prediction. To move beyond static evaluation, we introduce the "Misery Game Show", a novel gamified framework inspired by a television format. It tests LLMs through structured rounds involving ordinal comparison, binary classification, scalar estimation, and feedback-driven reasoning. This setup enables us to assess not only predictive accuracy but also the model's ability to adapt based on corrective feedback. The gamified evaluation highlights the broader potential of LLMs in dynamic emotional reasoning tasks beyond standard regression. Code and data link: https://github.com/abhi1nandy2/Misery_Data_Exps_GitHub

인간의 고통에 대한 예측 분석을 위한 대규모 언어 모델 활용

Leveraging Large Language Models for Predictive Analysis of Human Misery

초록

Support