RbtAct: 実践可能なレビュー意見生成のための監督信号としての反論

要旨

大規模言語モデル（LLM）は、査読レポートの起草を含む科学ワークフロー全体でますます利用されている。しかし、AI生成の査読コメントの多くは表面的で実践的な助言に乏しく、著者に具体的で実装可能な指針を提供できていない。この課題を解決するため、本論文は実践的な査読フィードバック生成を目的としたRbtActを提案する。本手法は、既存の査読反論を学習の中心に据える。反論内容を分析することで、どの査読コメントが具体的な修正や計画に結びつき、どのコメントが単に反論されただけかを明らかにする。この知見に基づき、反論を暗黙的な監督信号として利用し、実践性を直接最適化するフィードバック生成器を構築する。この目的を支援するため、新たなタスク「視点条件付きセグメントレベル査読フィードバック生成」を提案する。このタスクでは、モデルが論文全文と「実験」や「文章表現」などの特定視点に基づき、単一の焦点化されたコメントを生成する必要がある。さらに、査読セグメントとそれに対応する反論セグメントを対応付け、視点ラベルと著者の対応状況を示す影響度カテゴリを付与した大規模データセットRMR-75Kを構築した。Llama-3.1-8B-Instructモデルに対し、査読セグメントを用いた教師ありファインチューニングを実施後、反論データから導出したペアを用いた選好最適化を適用した。専門家による評価とLLMを審判とする評価の双方で、根拠の明確さと関連性を維持しつつ、実践性と具体性において強力なベースラインを一貫して上回る結果を得た。

English

Large language models (LLMs) are increasingly used across the scientific workflow, including to draft peer-review reports. However, many AI-generated reviews are superficial and insufficiently actionable, leaving authors without concrete, implementable guidance and motivating the gap this work addresses. We propose RbtAct, which targets actionable review feedback generation and places existing peer review rebuttal at the center of learning. Rebuttals show which reviewer comments led to concrete revisions or specific plans, and which were only defended. Building on this insight, we leverage rebuttal as implicit supervision to directly optimize a feedback generator for actionability. To support this objective, we propose a new task called perspective-conditioned segment-level review feedback generation, in which the model is required to produce a single focused comment based on the complete paper and a specified perspective such as experiments and writing. We also build a large dataset named RMR-75K that maps review segments to the rebuttal segments that address them, with perspective labels and impact categories that order author uptake. We then train the Llama-3.1-8B-Instruct model with supervised fine-tuning on review segments followed by preference optimization using rebuttal derived pairs. Experiments with human experts and LLM-as-a-judge show consistent gains in actionability and specificity over strong baselines while maintaining grounding and relevance.

RbtAct: 実践可能なレビュー意見生成のための監督信号としての反論

RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation

要旨

Support