LazyReview: NLP分野の査読における怠惰な思考を明らかにするためのデータセット

要旨

査読は科学出版における品質管理の基盤である。しかし、作業量の増加に伴い、「手っ取り早い」ヒューリスティックの意図せぬ使用、いわゆる「怠惰な思考」が繰り返し発生し、査読の質を損なう問題となっている。このようなヒューリスティックを検出する自動化手法は、査読プロセスの改善に役立つ可能性がある。しかし、この問題に関する自然言語処理（NLP）研究は限られており、検出ツールの開発を支援する現実世界のデータセットも存在しない。本研究では、細かい「怠惰な思考」カテゴリで注釈付けされた査読文のデータセット「LazyReview」を紹介する。分析の結果、大規模言語モデル（LLM）はゼロショット設定でこれらの事例を検出するのに苦戦することが明らかになった。しかし、本データセットを用いた指示ベースのファインチューニングにより、性能が10～20ポイント向上し、高品質なトレーニングデータの重要性が浮き彫りになった。さらに、制御実験により、「怠惰な思考」フィードバックを用いて修正された査読は、そのようなフィードバックなしで書かれた査読よりも包括的で実践的であることが示された。我々は、コミュニティ内の若手査読者をトレーニングするために使用できるデータセットと強化されたガイドラインを公開する予定である。（コードはこちらで公開中：https://github.com/UKPLab/arxiv2025-lazy-review）

English

Peer review is a cornerstone of quality control in scientific publishing. With the increasing workload, the unintended use of `quick' heuristics, referred to as lazy thinking, has emerged as a recurring issue compromising review quality. Automated methods to detect such heuristics can help improve the peer-reviewing process. However, there is limited NLP research on this issue, and no real-world dataset exists to support the development of detection tools. This work introduces LazyReview, a dataset of peer-review sentences annotated with fine-grained lazy thinking categories. Our analysis reveals that Large Language Models (LLMs) struggle to detect these instances in a zero-shot setting. However, instruction-based fine-tuning on our dataset significantly boosts performance by 10-20 performance points, highlighting the importance of high-quality training data. Furthermore, a controlled experiment demonstrates that reviews revised with lazy thinking feedback are more comprehensive and actionable than those written without such feedback. We will release our dataset and the enhanced guidelines that can be used to train junior reviewers in the community. (Code available here: https://github.com/UKPLab/arxiv2025-lazy-review)