鎖の中で踊る：心の理論による学術的反駁における戦略的説得

要旨

人工知能（AI）が研究ワークフローの様々な段階に深く統合され、目覚ましい進歩を遂げている一方で、学術的反駁（リバタル）は依然として重要でありながら十分に研究されていない課題である。これは、反駁が単純な技術的議論ではなく、深刻な情報の非対称性下における戦略的コミュニケーションの複雑なプロセスであるためである。その結果、現在のアプローチは表面的な言語表現を模倣することが主であり、効果的な説得に必要な視点取得という本質的要素を見落としているため、苦戦している。本論文では、学術的反駁を心の理論（Theory of Mind; ToM）に基づいて構築する初めてのフレームワークであるRebuttalAgentを提案する。本フレームワークは、査読者の心的状態をモデル化し、説得戦略を策定し、戦略に基づいた応答を生成するToM-Strategy-Response（TSR）パイプラインを通じて運用化される。エージェントを訓練するため、新規の批評と洗練（critique-and-refine）アプローチにより合成された大規模データセットRebuttalBenchを構築した。訓練プロセスは2段階からなり、まず教師ありファインチューニング段階でエージェントにToMに基づく分析と戦略的計画能力を付与し、続く強化学習段階ではスケーラブルな自己改善のための自己報酬メカニズムを活用する。信頼性が高く効率的な自動評価のために、多様な情報源からの10万サンプル超の反駁データで訓練された専門評価器Rebuttal-RMをさらに開発し、強力な審判モデルGPT-4.1を超える人間の選好との採点一致性を達成した。大規模な実験により、RebuttalAgentが自動評価指標においてベースモデルを平均18.3%大幅に上回り、さらに自動評価及び人的評価の両方において先進的なプロプライエタリモデルをも凌駕することを示す。免責事項：生成された反駁内容は、著者へのインスピレーション提供および草稿作成補助を目的とした参考情報であり、著者自身の批判的分析と応答に代わるものではない。

English

Although artificial intelligence (AI) has become deeply integrated into various stages of the research workflow and achieved remarkable advancements, academic rebuttal remains a significant and underexplored challenge. This is because rebuttal is a complex process of strategic communication under severe information asymmetry rather than a simple technical debate. Consequently, current approaches struggle as they largely imitate surface-level linguistics, missing the essential element of perspective-taking required for effective persuasion. In this paper, we introduce RebuttalAgent, the first framework to ground academic rebuttal in Theory of Mind (ToM), operationalized through a ToM-Strategy-Response (TSR) pipeline that models reviewer mental state, formulates persuasion strategy, and generates strategy-grounded response. To train our agent, we construct RebuttalBench, a large-scale dataset synthesized via a novel critique-and-refine approach. Our training process consists of two stages, beginning with a supervised fine-tuning phase to equip the agent with ToM-based analysis and strategic planning capabilities, followed by a reinforcement learning phase leveraging the self-reward mechanism for scalable self-improvement. For reliable and efficient automated evaluation, we further develop Rebuttal-RM, a specialized evaluator trained on over 100K samples of multi-source rebuttal data, which achieves scoring consistency with human preferences surpassing powerful judge GPT-4.1. Extensive experiments show RebuttalAgent significantly outperforms the base model by an average of 18.3% on automated metrics, while also outperforming advanced proprietary models across both automated and human evaluations. Disclaimer: the generated rebuttal content is for reference only to inspire authors and assist in drafting. It is not intended to replace the author's own critical analysis and response.

鎖の中で踊る：心の理論による学術的反駁における戦略的説得

Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind

要旨

Support