ChatPaper.aiChatPaper

欺騙性幽默:一個用於連接虛構聲明與幽默內容的合成多語言基準數據集

Deceptive Humor: A Synthetic Multilingual Benchmark Dataset for Bridging Fabricated Claims with Humorous Content

March 20, 2025
作者: Sai Kartheek Reddy Kasu, Shankar Biradar, Sunil Saumya
cs.AI

摘要

本文介紹了欺騙性幽默數據集(Deceptive Humor Dataset, DHD),這是一個用於研究源自虛構聲明和錯誤信息的幽默的新穎資源。在錯誤信息泛濫的時代,理解幽默如何與欺騙交織至關重要。DHD包含由虛構敘事生成的幽默評論,這些敘事利用ChatGPT-4o模型融入了虛構聲明和操縱信息。每個實例都標註了諷刺等級,從1級(微妙諷刺)到3級(高度諷刺),並分為五個不同的幽默類別:黑色幽默、反諷、社會評論、文字遊戲和荒誕。該數據集涵蓋多種語言,包括英語、泰盧固語、印地語、卡納達語、泰米爾語及其混合變體(Te-En、Hi-En、Ka-En、Ta-En),使其成為一個有價值的多語言基準。通過引入DHD,我們為分析欺騙性語境中的幽默建立了結構化基礎,為探索幽默不僅如何與錯誤信息互動,還如何影響其感知和傳播的新研究方向鋪平了道路。我們為所提出的數據集建立了強基準,為未來研究提供了基準和推進欺騙性幽默檢測模型的基礎。
English
This paper presents the Deceptive Humor Dataset (DHD), a novel resource for studying humor derived from fabricated claims and misinformation. In an era of rampant misinformation, understanding how humor intertwines with deception is essential. DHD consists of humor-infused comments generated from false narratives, incorporating fabricated claims and manipulated information using the ChatGPT-4o model. Each instance is labeled with a Satire Level, ranging from 1 for subtle satire to 3 for high-level satire and classified into five distinct Humor Categories: Dark Humor, Irony, Social Commentary, Wordplay, and Absurdity. The dataset spans multiple languages including English, Telugu, Hindi, Kannada, Tamil, and their code-mixed variants (Te-En, Hi-En, Ka-En, Ta-En), making it a valuable multilingual benchmark. By introducing DHD, we establish a structured foundation for analyzing humor in deceptive contexts, paving the way for a new research direction that explores how humor not only interacts with misinformation but also influences its perception and spread. We establish strong baselines for the proposed dataset, providing a foundation for future research to benchmark and advance deceptive humor detection models.

Summary

AI-Generated Summary

PDF32March 21, 2025