ChatPaper.aiChatPaper

LazyReview:一個揭露NLP同行評審中懶惰思維的數據集

LazyReview A Dataset for Uncovering Lazy Thinking in NLP Peer Reviews

April 15, 2025
作者: Sukannya Purkayastha, Zhuang Li, Anne Lauscher, Lizhen Qu, Iryna Gurevych
cs.AI

摘要

同行評審是科學出版品質控制的基石。隨著工作量的增加,無意中使用「快速」啟發式方法(被稱為懶惰思維)已成為一個反覆出現的問題,影響了評審品質。自動化檢測此類啟發式方法的手段有助於提升同行評審過程。然而,針對這一問題的自然語言處理研究有限,且缺乏支持檢測工具開發的真實世界數據集。本研究引入了LazyReview,這是一個標註了細粒度懶惰思維類別的同行評審句子數據集。我們的分析顯示,大型語言模型(LLMs)在零樣本設定下難以檢測這些情況。但基於我們數據集的指令微調顯著提升了性能,提高了10-20個性能點,凸顯了高品質訓練數據的重要性。此外,一項對照實驗表明,經過懶惰思維反饋修改的評審比未經此類反饋撰寫的評審更為全面且具可操作性。我們將公開我們的數據集及改進後的指南,這些資源可用於培訓社群中的初級評審人員。(代碼見此處:https://github.com/UKPLab/arxiv2025-lazy-review)
English
Peer review is a cornerstone of quality control in scientific publishing. With the increasing workload, the unintended use of `quick' heuristics, referred to as lazy thinking, has emerged as a recurring issue compromising review quality. Automated methods to detect such heuristics can help improve the peer-reviewing process. However, there is limited NLP research on this issue, and no real-world dataset exists to support the development of detection tools. This work introduces LazyReview, a dataset of peer-review sentences annotated with fine-grained lazy thinking categories. Our analysis reveals that Large Language Models (LLMs) struggle to detect these instances in a zero-shot setting. However, instruction-based fine-tuning on our dataset significantly boosts performance by 10-20 performance points, highlighting the importance of high-quality training data. Furthermore, a controlled experiment demonstrates that reviews revised with lazy thinking feedback are more comprehensive and actionable than those written without such feedback. We will release our dataset and the enhanced guidelines that can be used to train junior reviewers in the community. (Code available here: https://github.com/UKPLab/arxiv2025-lazy-review)

Summary

AI-Generated Summary

PDF42April 16, 2025