RbtAct:以反駁作為監督機制生成可操作評論反饋
RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation
March 10, 2026
作者: Sihong Wu, Yiling Ma, Yilun Zhao, Tiansheng Hu, Owen Jiang, Manasi Patwardhan, Arman Cohan
cs.AI
摘要
大型語言模型(LLMs)在科學工作流程中的應用日益廣泛,包括用於起草同行評審報告。然而,許多AI生成的評審內容流於表面且缺乏可操作性,導致作者無法獲得具體可行的指導,這正是本研究致力解決的關鍵問題。我們提出RbtAct方法,該方法以生成具有可操作性的評審反饋為目標,並將現有的論文答辯環節置於學習過程的核心位置。論文答辯能揭示哪些評審意見促成了具體修改或明確計劃,而哪些意見僅被作者辯護維持。基於此洞察,我們利用答辯內容作為隱式監督信號,直接優化反饋生成器的可操作性。為實現這一目標,我們提出一項新任務——視角約束的段落級評審反饋生成,要求模型基於完整論文及指定視角(如實驗設計、寫作表達)生成聚焦的單點評論。同時構建了包含7.5萬條數據的RMR-75K數據集,其中標註了評審段落與對應答辯段落的映射關係、視角標籤以及反映作者接納程度的影響類別。我們採用監督微調方式在評審段落上訓練Llama-3.1-8B-Instruct模型,隨後利用基於答辯數據構建的偏好對進行偏好優化。經專家評測和LLM作為評判者的實驗表明,該方法在保持內容相關性和紮實度的同時,相較強基線模型在可操作性與具體性方面實現了持續提升。
English
Large language models (LLMs) are increasingly used across the scientific workflow, including to draft peer-review reports. However, many AI-generated reviews are superficial and insufficiently actionable, leaving authors without concrete, implementable guidance and motivating the gap this work addresses. We propose RbtAct, which targets actionable review feedback generation and places existing peer review rebuttal at the center of learning. Rebuttals show which reviewer comments led to concrete revisions or specific plans, and which were only defended. Building on this insight, we leverage rebuttal as implicit supervision to directly optimize a feedback generator for actionability. To support this objective, we propose a new task called perspective-conditioned segment-level review feedback generation, in which the model is required to produce a single focused comment based on the complete paper and a specified perspective such as experiments and writing. We also build a large dataset named RMR-75K that maps review segments to the rebuttal segments that address them, with perspective labels and impact categories that order author uptake. We then train the Llama-3.1-8B-Instruct model with supervised fine-tuning on review segments followed by preference optimization using rebuttal derived pairs. Experiments with human experts and LLM-as-a-judge show consistent gains in actionability and specificity over strong baselines while maintaining grounding and relevance.