RbtAct:以反驳为监督的可执行评论反馈生成方法
RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation
March 10, 2026
作者: Sihong Wu, Yiling Ma, Yilun Zhao, Tiansheng Hu, Owen Jiang, Manasi Patwardhan, Arman Cohan
cs.AI
摘要
大型语言模型(LLMs)在科研工作流中的应用日益广泛,包括用于起草同行评审报告。然而,许多AI生成的评审内容流于表面且缺乏可操作性,导致作者无法获得具体可行的指导,这也正是本研究致力于解决的痛点。我们提出RbtAct方法,该方案以生成具有可操作性的评审反馈为目标,并将现有同行评审反驳意见作为核心学习素材。反驳意见能揭示哪些评审意见促成了实质性修改或具体计划,而哪些仅被作者辩护性回应。基于此洞见,我们利用反驳意见作为隐式监督信号,直接优化反馈生成器的可操作性。为支撑该目标,我们提出名为"视角约束的片段级评审反馈生成"的新任务,要求模型基于完整论文和特定视角(如实验设计、行文表达)生成聚焦的单项评论。同时构建了包含7.5万条数据的大规模数据集RMR-75K,其中将评审片段与对应的反驳片段进行映射,并标注视角标签和反映作者采纳程度的影响力类别。我们采用监督微调方式在评审片段上训练Llama-3.1-8B-Instruct模型,继而利用反驳意见衍生的配对数据进行偏好优化。经专家评估和LLM作为评判者的实验表明,该方法在保持内容相关性与准确性的同时,相较于强基线模型在可操作性和具体性方面均取得稳定提升。
English
Large language models (LLMs) are increasingly used across the scientific workflow, including to draft peer-review reports. However, many AI-generated reviews are superficial and insufficiently actionable, leaving authors without concrete, implementable guidance and motivating the gap this work addresses. We propose RbtAct, which targets actionable review feedback generation and places existing peer review rebuttal at the center of learning. Rebuttals show which reviewer comments led to concrete revisions or specific plans, and which were only defended. Building on this insight, we leverage rebuttal as implicit supervision to directly optimize a feedback generator for actionability. To support this objective, we propose a new task called perspective-conditioned segment-level review feedback generation, in which the model is required to produce a single focused comment based on the complete paper and a specified perspective such as experiments and writing. We also build a large dataset named RMR-75K that maps review segments to the rebuttal segments that address them, with perspective labels and impact categories that order author uptake. We then train the Llama-3.1-8B-Instruct model with supervised fine-tuning on review segments followed by preference optimization using rebuttal derived pairs. Experiments with human experts and LLM-as-a-judge show consistent gains in actionability and specificity over strong baselines while maintaining grounding and relevance.