通过辩证对齐驯化智能体中的行动者-观察者不对称性

摘要

大型语言模型智能体已从静态文本生成器迅速发展为能够执行复杂自主工作流程的动态系统。为提升系统可靠性，业界日益采用多智能体框架，通过分配专业化角色来实现自我反思与相互审计。虽然这种角色扮演机制有效利用了领域专家知识，但我们发现其同时会引发一种类人认知偏差——行动者-观察者不对称性。具体而言，担任行动者角色的智能体（在自我反思时）倾向于将失败归因于外部因素，而作为观察者的智能体（在相互审计时）却将相同错误归咎于内部缺陷。我们通过新构建的模糊失败基准测试量化这一现象，发现仅需切换观察视角即可在超过20%的案例中触发大多数模型的AOA效应。为抑制这种偏差，我们提出ReTAS（辩证推理法），该模型通过辩证对齐训练实现视角不变推理。通过将辩证思维链与群体相对策略优化相结合，ReTAS引导智能体将冲突观点合成为客观共识。实验表明，ReTAS能有效缓解归因不一致性，并在模糊情境中显著提升故障解决率。

English

Large Language Model agents have rapidly evolved from static text generators into dynamic systems capable of executing complex autonomous workflows. To enhance reliability, multi-agent frameworks assigning specialized roles are increasingly adopted to enable self-reflection and mutual auditing. While such role-playing effectively leverages domain expert knowledge, we find it simultaneously induces a human-like cognitive bias known as Actor-Observer Asymmetry (AOA). Specifically, an agent acting as an actor (during self-reflection) tends to attribute failures to external factors, whereas an observer (during mutual auditing) attributes the same errors to internal faults. We quantify this using our new Ambiguous Failure Benchmark, which reveals that simply swapping perspectives triggers the AOA effect in over 20% of cases for most models. To tame this bias, we introduce ReTAS (Reasoning via Thesis-Antithesis-Synthesis), a model trained through dialectical alignment to enforce perspective-invariant reasoning. By integrating dialectical chain-of-thought with Group Relative Policy Optimization, ReTAS guides agents to synthesize conflicting viewpoints into an objective consensus. Experiments demonstrate that ReTAS effectively mitigates attribution inconsistency and significantly improves fault resolution rates in ambiguous scenarios.