通过先进提示工程技术增强大型语言模型的情感分类与反讽检测能力
Enhancing Sentiment Classification and Irony Detection in Large Language Models through Advanced Prompt Engineering Techniques
January 13, 2026
作者: Marvin Schmitt, Anne Schwerk, Sebastian Lempert
cs.AI
摘要
本研究探讨如何通过提示工程优化大型语言模型(LLMs)——特别是GPT-4o-mini与gemini-1.5-flash——在情感分析任务中的表现。通过对比基线方法,评估了小样本学习、思维链提示及自我一致性等先进提示技术的效果。核心任务涵盖情感分类、基于方面的情感分析以及反讽等细微情感差异的识别。研究详细阐述了理论背景、数据集与实验方法,并以准确率、召回率、精确率和F1分数为指标评估模型性能。研究发现:先进提示技术能显著提升情感分析效果,其中小样本提示在GPT-4o-mini中表现最优,而思维链提示使gemini-1.5-flash的反讽检测能力最高提升46%。这表明,尽管先进提示技术能整体提升性能,但针对GPT-4o-mini需采用小样本提示,而gemini-1.5-flash的反讽检测更适合思维链提示,因此提示策略必须根据模型与任务特性进行定制。这一发现凸显了提示设计需同时契合LLM架构与任务语义复杂度的双重重要性。
English
This study investigates the use of prompt engineering to enhance large language models (LLMs), specifically GPT-4o-mini and gemini-1.5-flash, in sentiment analysis tasks. It evaluates advanced prompting techniques like few-shot learning, chain-of-thought prompting, and self-consistency against a baseline. Key tasks include sentiment classification, aspect-based sentiment analysis, and detecting subtle nuances such as irony. The research details the theoretical background, datasets, and methods used, assessing performance of LLMs as measured by accuracy, recall, precision, and F1 score. Findings reveal that advanced prompting significantly improves sentiment analysis, with the few-shot approach excelling in GPT-4o-mini and chain-of-thought prompting boosting irony detection in gemini-1.5-flash by up to 46%. Thus, while advanced prompting techniques overall improve performance, the fact that few-shot prompting works best for GPT-4o-mini and chain-of-thought excels in gemini-1.5-flash for irony detection suggests that prompting strategies must be tailored to both the model and the task. This highlights the importance of aligning prompt design with both the LLM's architecture and the semantic complexity of the task.