Transformer遇见神经算法推理器

摘要

Transformer以其简单而有效的架构彻底改变了机器学习。在大规模文本数据集上对Transformer进行预训练，源自互联网，使其在自然语言理解（NLU）任务中实现了无与伦比的泛化能力。然而，当面临需要精确和稳健计算的算法推理任务时，这类语言模型仍然显得脆弱。为了解决这一局限性，我们提出了一种新颖的方法，将Transformer的语言理解能力与基于图神经网络（GNN）的神经算法推理器（NARs）的稳健性相结合。这种NARs被证明在图形式规范下对算法任务具有通用求解器的效果。为了使它们的嵌入对Transformer可访问，我们提出了一种混合架构，采用两阶段训练程序，使语言模型中的标记能够跨越关注来自NAR的节点嵌入。我们在CLRS-Text上评估了我们得到的TransNAR模型，这是CLRS-30基准测试的文本版本，并展示了在算法推理方面，无论在内部还是在分布之外，与仅使用Transformer模型相比的显著增益。

English

Transformers have revolutionized machine learning with their simple yet effective architecture. Pre-training Transformers on massive text datasets from the Internet has led to unmatched generalization for natural language understanding (NLU) tasks. However, such language models remain fragile when tasked with algorithmic forms of reasoning, where computations must be precise and robust. To address this limitation, we propose a novel approach that combines the Transformer's language understanding with the robustness of graph neural network (GNN)-based neural algorithmic reasoners (NARs). Such NARs proved effective as generic solvers for algorithmic tasks, when specified in graph form. To make their embeddings accessible to a Transformer, we propose a hybrid architecture with a two-phase training procedure, allowing the tokens in the language model to cross-attend to the node embeddings from the NAR. We evaluate our resulting TransNAR model on CLRS-Text, the text-based version of the CLRS-30 benchmark, and demonstrate significant gains over Transformer-only models for algorithmic reasoning, both in and out of distribution.

Transformer遇见神经算法推理器

Transformers meet Neural Algorithmic Reasoners

摘要

Support