戴着镣铐起舞:基于心智理论的学术反驳策略性说服
Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind
January 22, 2026
作者: Zhitao He, Zongwei Lyu, Yi R Fung
cs.AI
摘要
尽管人工智能已深度融入科研工作流程的各个环节并取得显著进展,学术反驳仍是一个重要却尚未被充分探索的挑战。这是因为反驳并非简单的技术辩论,而是在严重信息不对称下进行策略性沟通的复杂过程。现有方法因主要模仿表层语言特征而难以奏效,未能把握有效说服所需的核心要素——观点采择能力。本文提出首个基于心理理论(ToM)的学术反驳框架RebuttalAgent,通过"心理状态-策略-响应"(TSR)三级流水线实现操作化,依次建模审稿人心理状态、制定说服策略并生成策略导向的回应。为训练智能体,我们采用新型的"批判-优化"方法构建了大规模数据集RebuttalBench。训练过程分为两个阶段:首先通过监督微调使智能体掌握基于心理理论的分析与策略规划能力,随后利用自奖励机制进行强化学习以实现规模化自我改进。为建立可靠高效的自动评估体系,我们进一步开发了基于10万条多源反驳数据训练的专业评估器Rebuttal-RM,其评分一致性超越强基准GPT-4.1,更贴合人类偏好。大量实验表明,RebuttalAgent在自动评估指标上较基础模型平均提升18.3%,同时在自动与人工评估中均优于先进专有模型。免责声明:生成的反驳内容仅供启发作者思路、辅助起草使用,不能替代作者自身的批判性分析与回应。
English
Although artificial intelligence (AI) has become deeply integrated into various stages of the research workflow and achieved remarkable advancements, academic rebuttal remains a significant and underexplored challenge. This is because rebuttal is a complex process of strategic communication under severe information asymmetry rather than a simple technical debate. Consequently, current approaches struggle as they largely imitate surface-level linguistics, missing the essential element of perspective-taking required for effective persuasion. In this paper, we introduce RebuttalAgent, the first framework to ground academic rebuttal in Theory of Mind (ToM), operationalized through a ToM-Strategy-Response (TSR) pipeline that models reviewer mental state, formulates persuasion strategy, and generates strategy-grounded response. To train our agent, we construct RebuttalBench, a large-scale dataset synthesized via a novel critique-and-refine approach. Our training process consists of two stages, beginning with a supervised fine-tuning phase to equip the agent with ToM-based analysis and strategic planning capabilities, followed by a reinforcement learning phase leveraging the self-reward mechanism for scalable self-improvement. For reliable and efficient automated evaluation, we further develop Rebuttal-RM, a specialized evaluator trained on over 100K samples of multi-source rebuttal data, which achieves scoring consistency with human preferences surpassing powerful judge GPT-4.1. Extensive experiments show RebuttalAgent significantly outperforms the base model by an average of 18.3% on automated metrics, while also outperforming advanced proprietary models across both automated and human evaluations. Disclaimer: the generated rebuttal content is for reference only to inspire authors and assist in drafting. It is not intended to replace the author's own critical analysis and response.