在野外探索:利用變換器進行現實世界多跳推理的數據增強
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers
April 29, 2025
作者: Roman Abramov, Felix Steinbauer, Gjergji Kasneci
cs.AI
摘要
Transformer模型在众多自然语言处理任务中取得了显著成功,但在多步事实推理方面仍存在明显不足,尤其是在现实世界知识稀疏的情况下。最近,关于“顿悟”(grokking)的研究进展表明,神经网络一旦检测到潜在逻辑模式,便能从记忆过渡到完美泛化——然而这些研究主要局限于小型、合成的任务。本文首次将“顿悟”概念扩展至现实世界的事实数据,并通过精心设计的合成数据增强现有知识图谱,以提升推断事实与原子事实的比例φ_r,使其超过“顿悟”所需的阈值,从而应对数据集稀疏的挑战。令人惊讶的是,我们发现即使是事实错误的合成数据,也能强化涌现的推理回路而非降低准确性,因为它迫使模型依赖关系结构而非记忆。在多跳推理基准测试中,我们的方法在2WikiMultiHopQA上达到了95-100%的准确率,显著超越了强基线,并匹配或超越了当前的最先进结果。我们进一步深入分析了增加φ_r如何驱动Transformer内部泛化回路的形成。研究结果表明,基于“顿悟”的数据增强能够解锁隐含的多跳推理能力,为大规模语言模型实现更稳健且可解释的事实推理开辟了道路。
English
Transformers have achieved great success in numerous NLP tasks but continue
to exhibit notable gaps in multi-step factual reasoning, especially when
real-world knowledge is sparse. Recent advances in grokking have demonstrated
that neural networks can transition from memorizing to perfectly generalizing
once they detect underlying logical patterns - yet these studies have primarily
used small, synthetic tasks. In this paper, for the first time, we extend
grokking to real-world factual data and address the challenge of dataset
sparsity by augmenting existing knowledge graphs with carefully designed
synthetic data to raise the ratio phi_r of inferred facts to atomic facts
above the threshold required for grokking. Surprisingly, we find that even
factually incorrect synthetic data can strengthen emergent reasoning circuits
rather than degrade accuracy, as it forces the model to rely on relational
structure rather than memorization. When evaluated on multi-hop reasoning
benchmarks, our approach achieves up to 95-100% accuracy on 2WikiMultiHopQA -
substantially improving over strong baselines and matching or exceeding current
state-of-the-art results. We further provide an in-depth analysis of how
increasing phi_r drives the formation of generalizing circuits inside
Transformers. Our findings suggest that grokking-based data augmentation can
unlock implicit multi-hop reasoning capabilities, opening the door to more
robust and interpretable factual reasoning in large-scale language models.Summary
AI-Generated Summary