大型语言模型潜在地执行多跳推理吗?
Do Large Language Models Latently Perform Multi-Hop Reasoning?
February 26, 2024
作者: Sohee Yang, Elena Gribovskaya, Nora Kassner, Mor Geva, Sebastian Riedel
cs.AI
摘要
我们研究了大型语言模型(LLMs)是否在复杂提示中潜在地进行多跳推理,例如“‘迷信’歌手的母亲是谁”。我们寻找潜在推理路径的证据,其中一个LLM(1)潜在地确定“‘迷信’的歌手”为Stevie Wonder,作为桥梁实体,然后(2)利用其对Stevie Wonder母亲的了解来完成提示。我们分别分析这两个跳,并将它们的共现视为潜在多跳推理的指标。对于第一个跳,我们测试将提示更改为间接提及桥梁实体而不是任何其他实体是否会增加LLM对桥梁实体的内部回忆。对于第二个跳,我们测试增加这种回忆是否会导致LLM更好地利用其对桥梁实体的了解。我们发现在某些关系类型的提示中存在潜在的多跳推理证据,其中80%以上的提示使用了这种推理路径。然而,利用是高度依赖语境的,在不同类型的提示中变化很大。此外,平均而言,第二跳和完整的多跳遍历的证据相当适度,只有第一跳的证据较为充分。此外,我们发现第一跳推理随着模型规模的增加呈现明显的扩展趋势,但第二跳并非如此。我们的实验结果表明了LLMs未来发展和应用中可能面临的挑战和机遇。
English
We study whether Large Language Models (LLMs) latently perform multi-hop
reasoning with complex prompts such as "The mother of the singer of
'Superstition' is". We look for evidence of a latent reasoning pathway where an
LLM (1) latently identifies "the singer of 'Superstition'" as Stevie Wonder,
the bridge entity, and (2) uses its knowledge of Stevie Wonder's mother to
complete the prompt. We analyze these two hops individually and consider their
co-occurrence as indicative of latent multi-hop reasoning. For the first hop,
we test if changing the prompt to indirectly mention the bridge entity instead
of any other entity increases the LLM's internal recall of the bridge entity.
For the second hop, we test if increasing this recall causes the LLM to better
utilize what it knows about the bridge entity. We find strong evidence of
latent multi-hop reasoning for the prompts of certain relation types, with the
reasoning pathway used in more than 80% of the prompts. However, the
utilization is highly contextual, varying across different types of prompts.
Also, on average, the evidence for the second hop and the full multi-hop
traversal is rather moderate and only substantial for the first hop. Moreover,
we find a clear scaling trend with increasing model size for the first hop of
reasoning but not for the second hop. Our experimental findings suggest
potential challenges and opportunities for future development and applications
of LLMs.