在现实世界中实现多步推理：基于Transformer的数据增强方法探索

摘要

Transformer模型在众多自然语言处理任务中取得了巨大成功，但在多步事实推理方面仍存在显著不足，尤其是在现实世界知识稀缺的情况下。近期关于“顿悟”（grokking）的研究表明，神经网络一旦识别出潜在的逻辑模式，就能从记忆过渡到完美泛化——然而这些研究主要使用小型合成任务。本文首次将“顿悟”扩展到现实世界的事实数据，并通过精心设计的合成数据增强现有知识图谱，以应对数据集稀疏的挑战，将推理事实与原子事实的比例phi_r提升至“顿悟”所需的阈值之上。令人惊讶的是，我们发现即使是事实错误的合成数据也能强化涌现的推理回路，而非降低准确性，因为它迫使模型依赖关系结构而非记忆。在多跳推理基准测试中，我们的方法在2WikiMultiHopQA上达到了95-100%的准确率，显著超越了强基线模型，并匹配或超越了当前的最先进结果。我们进一步深入分析了提高phi_r如何驱动Transformer内部泛化回路的形成。我们的研究结果表明，基于“顿悟”的数据增强能够释放隐式的多跳推理能力，为大规模语言模型实现更稳健且可解释的事实推理开辟了道路。

English

Transformers have achieved great success in numerous NLP tasks but continue to exhibit notable gaps in multi-step factual reasoning, especially when real-world knowledge is sparse. Recent advances in grokking have demonstrated that neural networks can transition from memorizing to perfectly generalizing once they detect underlying logical patterns - yet these studies have primarily used small, synthetic tasks. In this paper, for the first time, we extend grokking to real-world factual data and address the challenge of dataset sparsity by augmenting existing knowledge graphs with carefully designed synthetic data to raise the ratio phi_r of inferred facts to atomic facts above the threshold required for grokking. Surprisingly, we find that even factually incorrect synthetic data can strengthen emergent reasoning circuits rather than degrade accuracy, as it forces the model to rely on relational structure rather than memorization. When evaluated on multi-hop reasoning benchmarks, our approach achieves up to 95-100% accuracy on 2WikiMultiHopQA - substantially improving over strong baselines and matching or exceeding current state-of-the-art results. We further provide an in-depth analysis of how increasing phi_r drives the formation of generalizing circuits inside Transformers. Our findings suggest that grokking-based data augmentation can unlock implicit multi-hop reasoning capabilities, opening the door to more robust and interpretable factual reasoning in large-scale language models.

在现实世界中实现多步推理：基于Transformer的数据增强方法探索

Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers

摘要

Support