当模型所知超越其解释能力：量化人机协作中的知识迁移

摘要

人工智能推理领域的最新进展推动了多项任务的显著提升。一个关键性的开放问题是：这些改进是否也带来了更好的知识迁移能力，即模型能否以人类可理解、可应用并从中学习的方式传达其推理过程。为探究这一问题，我们引入了知识整合与迁移评估（KITE），这是一个针对人机知识迁移能力的概念与实验框架，并开展了首个大规模人类研究（N=118），专门设计以衡量这一能力。在我们的两阶段实验设置中，人类首先与AI共同构思问题解决策略，随后独立实施解决方案，以此隔离模型解释对人类理解的影响。研究发现，尽管模型基准性能与协作成果存在相关性，但这种关系显著不一致，存在明显异常值，表明知识迁移需要专门的优化。我们的分析揭示了影响成功知识迁移的行为与策略因素。我们公开了代码、数据集及评估框架，以支持未来在沟通对齐模型方面的研究工作。

English

Recent advancements in AI reasoning have driven substantial improvements across diverse tasks. A critical open question is whether these improvements also yields better knowledge transfer: the ability of models to communicate reasoning in ways humans can understand, apply, and learn from. To investigate this, we introduce Knowledge Integration and Transfer Evaluation (KITE), a conceptual and experimental framework for Human-AI knowledge transfer capabilities and conduct the first large-scale human study (N=118) explicitly designed to measure it. In our two-phase setup, humans first ideate with an AI on problem-solving strategies, then independently implement solutions, isolating model explanations' influence on human understanding. Our findings reveal that although model benchmark performance correlates with collaborative outcomes, this relationship is notably inconsistent, featuring significant outliers, indicating that knowledge transfer requires dedicated optimization. Our analysis identifies behavioral and strategic factors mediating successful knowledge transfer. We release our code, dataset, and evaluation framework to support future work on communicatively aligned models.

当模型所知超越其解释能力：量化人机协作中的知识迁移

When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration

摘要

Support