LazyReview: NLP 동료 평가에서의 게으른 사고를 밝히기 위한 데이터셋

초록

동료 평가는 과학 출판의 품질 관리를 위한 핵심 요소입니다. 점점 증가하는 업무량으로 인해 '빠른' 휴리스틱의 의도치 않은 사용, 즉 게으른 사고(lazy thinking)가 평가 품질을 저해하는 반복적인 문제로 대두되고 있습니다. 이러한 휴리스틱을 탐지하기 위한 자동화된 방법은 동료 평가 프로세스를 개선하는 데 도움을 줄 수 있습니다. 그러나 이 문제에 대한 자연어 처리(NLP) 연구는 제한적이며, 탐지 도구 개발을 지원할 실제 데이터셋도 존재하지 않습니다. 이 연구에서는 세분화된 게으른 사고 범주로 주석이 달린 동료 평가 문장 데이터셋인 LazyReview를 소개합니다. 우리의 분석에 따르면, 대규모 언어 모델(LLMs)은 제로샷 설정에서 이러한 사례를 탐지하는 데 어려움을 겪습니다. 그러나 우리의 데이터셋을 기반으로 한 지시 기반 미세 조정(fine-tuning)은 성능을 10-20포인트 크게 향상시켜, 고품질 학습 데이터의 중요성을 강조합니다. 또한, 통제된 실험을 통해 게으른 사고 피드백으로 수정된 평가가 그러한 피드백 없이 작성된 평가보다 더 포괄적이고 실행 가능하다는 것을 입증했습니다. 우리는 커뮤니티에서 초보 평가자를 교육하는 데 사용할 수 있는 데이터셋과 개선된 가이드라인을 공개할 예정입니다. (코드는 여기에서 확인할 수 있습니다: https://github.com/UKPLab/arxiv2025-lazy-review)

English

Peer review is a cornerstone of quality control in scientific publishing. With the increasing workload, the unintended use of `quick' heuristics, referred to as lazy thinking, has emerged as a recurring issue compromising review quality. Automated methods to detect such heuristics can help improve the peer-reviewing process. However, there is limited NLP research on this issue, and no real-world dataset exists to support the development of detection tools. This work introduces LazyReview, a dataset of peer-review sentences annotated with fine-grained lazy thinking categories. Our analysis reveals that Large Language Models (LLMs) struggle to detect these instances in a zero-shot setting. However, instruction-based fine-tuning on our dataset significantly boosts performance by 10-20 performance points, highlighting the importance of high-quality training data. Furthermore, a controlled experiment demonstrates that reviews revised with lazy thinking feedback are more comprehensive and actionable than those written without such feedback. We will release our dataset and the enhanced guidelines that can be used to train junior reviewers in the community. (Code available here: https://github.com/UKPLab/arxiv2025-lazy-review)

LazyReview: NLP 동료 평가에서의 게으른 사고를 밝히기 위한 데이터셋

LazyReview A Dataset for Uncovering Lazy Thinking in NLP Peer Reviews

초록

Support