大型語言模型潛在執行多跳推理嗎？

摘要

我們研究大型語言模型（LLMs）是否潛在地進行多跳推理，針對複雜提示，如“‘Superstition’的歌手的母親是誰”進行分析。我們尋找潛在推理路徑的證據，其中LLM（1）潛在識別“‘Superstition’的歌手”為Stevie Wonder，作為橋樑實體，並（2）利用其對Stevie Wonder母親的知識完成提示。我們分析這兩個跳躍，並將它們的共同出現視為潛在多跳推理的指標。對於第一跳，我們測試將提示間接提及橋樑實體而不是其他實體是否增加LLM對橋樑實體的內部回憶。對於第二跳，我們測試增加這種回憶是否使LLM更好地利用其對橋樑實體的了解。我們發現在某些關係類型的提示中存在潛在多跳推理的強烈證據，其中該推理路徑在超過80％的提示中使用。然而，利用是高度情境化的，在不同類型的提示中變化很大。此外，平均而言，第二跳和完整的多跳遍歷的證據相當中等，僅對第一跳有實質性影響。此外，我們發現隨著模型大小的增加，第一跳推理的趨勢明顯增加，但第二跳則沒有。我們的實驗結果表明，對於LLMs未來發展和應用存在潛在挑戰和機遇。

English

We study whether Large Language Models (LLMs) latently perform multi-hop reasoning with complex prompts such as "The mother of the singer of 'Superstition' is". We look for evidence of a latent reasoning pathway where an LLM (1) latently identifies "the singer of 'Superstition'" as Stevie Wonder, the bridge entity, and (2) uses its knowledge of Stevie Wonder's mother to complete the prompt. We analyze these two hops individually and consider their co-occurrence as indicative of latent multi-hop reasoning. For the first hop, we test if changing the prompt to indirectly mention the bridge entity instead of any other entity increases the LLM's internal recall of the bridge entity. For the second hop, we test if increasing this recall causes the LLM to better utilize what it knows about the bridge entity. We find strong evidence of latent multi-hop reasoning for the prompts of certain relation types, with the reasoning pathway used in more than 80% of the prompts. However, the utilization is highly contextual, varying across different types of prompts. Also, on average, the evidence for the second hop and the full multi-hop traversal is rather moderate and only substantial for the first hop. Moreover, we find a clear scaling trend with increasing model size for the first hop of reasoning but not for the second hop. Our experimental findings suggest potential challenges and opportunities for future development and applications of LLMs.

大型語言模型潛在執行多跳推理嗎？

Do Large Language Models Latently Perform Multi-Hop Reasoning?

摘要

Support