ChatPaper.aiChatPaper

超越“一刀切”:反演学习助力高效自然语言生成评估提示

Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts

April 29, 2025
作者: Hanhua Hong, Chenghao Xiao, Yang Wang, Yiqi Liu, Wenge Rong, Chenghua Lin
cs.AI

摘要

评估自然语言生成(NLG)系统颇具挑战,原因在于有效输出的多样性。尽管人工评估被视为黄金标准,但其存在不一致性、缺乏标准化以及人口统计偏差等问题,限制了结果的可复现性。基于大语言模型(LLM)的评估提供了一种可扩展的替代方案,但对提示设计极为敏感,细微变化可能导致显著差异。在本研究中,我们提出了一种逆向学习方法,该方法能够从模型输出中学习有效的反向映射,回溯至其输入指令,从而自动生成针对特定模型的高效评估提示。我们的方法仅需单个评估样本,无需耗时的手动提示工程,从而提升了评估效率与鲁棒性。本研究为推动更稳健、高效的基于LLM的评估开辟了新方向。
English
Evaluating natural language generation (NLG) systems is challenging due to the diversity of valid outputs. While human evaluation is the gold standard, it suffers from inconsistencies, lack of standardisation, and demographic biases, limiting reproducibility. LLM-based evaluation offers a scalable alternative but is highly sensitive to prompt design, where small variations can lead to significant discrepancies. In this work, we propose an inversion learning method that learns effective reverse mappings from model outputs back to their input instructions, enabling the automatic generation of highly effective, model-specific evaluation prompts. Our method requires only a single evaluation sample and eliminates the need for time-consuming manual prompt engineering, thereby improving both efficiency and robustness. Our work contributes toward a new direction for more robust and efficient LLM-based evaluation.

Summary

AI-Generated Summary

PDF122May 5, 2025