戴着镣铐起舞:基于心智理论的学术反驳策略性说服
Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind
January 22, 2026
作者: Zhitao He, Zongwei Lyu, Yi R Fung
cs.AI
摘要
尽管人工智能已深度融入科研工作流的各个环节并取得显著进展,学术反驳环节仍是重要却尚未充分探索的挑战。这源于反驳本质上是在严重信息不对称下进行的策略性沟通过程,而非简单的技术辩论。现有方法因大多停留在表层语言模仿,缺乏有效说服所需的核心要素——观点采择能力,故而难以突破。本文提出首个基于心智理论(ToM)的学术反驳框架RebuttalAgent,通过"心智状态建模-策略制定-策略响应"的三阶管道,将反驳任务具象化为审稿人心理状态模拟、说服策略构建及策略驱动响应生成的完整流程。为训练智能体,我们采用创新的批判优化法构建了大规模数据集RebuttalBench,训练过程包含两个阶段:首先通过监督微调赋予智能体基于心智理论的分析与策略规划能力,继而利用自奖励机制进行强化学习以实现规模化自我优化。针对自动化评估需求,我们进一步开发了基于10万条多源反驳数据训练的专业评估器Rebuttal-RM,其评分一致性已超越强基准GPT-4.1,更贴近人类偏好。大量实验表明,RebuttalAgent在自动化指标上平均领先基线模型18.3%,同时在自动与人工评估中均优于先进闭源模型。免责声明:生成的反驳内容仅供启发作者思路、辅助起草使用,不能替代作者自身的批判性分析与回应。
English
Although artificial intelligence (AI) has become deeply integrated into various stages of the research workflow and achieved remarkable advancements, academic rebuttal remains a significant and underexplored challenge. This is because rebuttal is a complex process of strategic communication under severe information asymmetry rather than a simple technical debate. Consequently, current approaches struggle as they largely imitate surface-level linguistics, missing the essential element of perspective-taking required for effective persuasion. In this paper, we introduce RebuttalAgent, the first framework to ground academic rebuttal in Theory of Mind (ToM), operationalized through a ToM-Strategy-Response (TSR) pipeline that models reviewer mental state, formulates persuasion strategy, and generates strategy-grounded response. To train our agent, we construct RebuttalBench, a large-scale dataset synthesized via a novel critique-and-refine approach. Our training process consists of two stages, beginning with a supervised fine-tuning phase to equip the agent with ToM-based analysis and strategic planning capabilities, followed by a reinforcement learning phase leveraging the self-reward mechanism for scalable self-improvement. For reliable and efficient automated evaluation, we further develop Rebuttal-RM, a specialized evaluator trained on over 100K samples of multi-source rebuttal data, which achieves scoring consistency with human preferences surpassing powerful judge GPT-4.1. Extensive experiments show RebuttalAgent significantly outperforms the base model by an average of 18.3% on automated metrics, while also outperforming advanced proprietary models across both automated and human evaluations. Disclaimer: the generated rebuttal content is for reference only to inspire authors and assist in drafting. It is not intended to replace the author's own critical analysis and response.