I Modelli Linguistici di Grandi Dimensioni Eseguono Ragionamenti Multi-Passo in Modo Latente?

Abstract

Studiamo se i Modelli Linguistici di Grandi Dimensioni (LLM) eseguano latentemente ragionamenti multi-hop con prompt complessi come "La madre del cantante di 'Superstition' è". Cerchiamo evidenza di un percorso di ragionamento latente in cui un LLM (1) identifica latentemente "il cantante di 'Superstition'" come Stevie Wonder, l'entità ponte, e (2) utilizza la sua conoscenza della madre di Stevie Wonder per completare il prompt. Analizziamo questi due hop individualmente e consideriamo la loro co-occorrenza come indicativa di un ragionamento multi-hop latente. Per il primo hop, testiamo se modificare il prompt per menzionare indirettamente l'entità ponte invece di qualsiasi altra entità aumenti il richiamo interno dell'LLM dell'entità ponte. Per il secondo hop, testiamo se aumentare questo richiamo faccia sì che l'LLM utilizzi meglio ciò che sa sull'entità ponte. Troviamo forti evidenze di ragionamento multi-hop latente per i prompt di certi tipi di relazioni, con il percorso di ragionamento utilizzato in più dell'80% dei prompt. Tuttavia, l'utilizzo è altamente contestuale, variando tra diversi tipi di prompt. Inoltre, in media, l'evidenza per il secondo hop e il percorso completo multi-hop è piuttosto moderata e sostanziale solo per il primo hop. Inoltre, troviamo una chiara tendenza di scalabilità con l'aumento delle dimensioni del modello per il primo hop di ragionamento ma non per il secondo hop. I nostri risultati sperimentali suggeriscono potenziali sfide e opportunità per lo sviluppo futuro e le applicazioni degli LLM.

English

We study whether Large Language Models (LLMs) latently perform multi-hop reasoning with complex prompts such as "The mother of the singer of 'Superstition' is". We look for evidence of a latent reasoning pathway where an LLM (1) latently identifies "the singer of 'Superstition'" as Stevie Wonder, the bridge entity, and (2) uses its knowledge of Stevie Wonder's mother to complete the prompt. We analyze these two hops individually and consider their co-occurrence as indicative of latent multi-hop reasoning. For the first hop, we test if changing the prompt to indirectly mention the bridge entity instead of any other entity increases the LLM's internal recall of the bridge entity. For the second hop, we test if increasing this recall causes the LLM to better utilize what it knows about the bridge entity. We find strong evidence of latent multi-hop reasoning for the prompts of certain relation types, with the reasoning pathway used in more than 80% of the prompts. However, the utilization is highly contextual, varying across different types of prompts. Also, on average, the evidence for the second hop and the full multi-hop traversal is rather moderate and only substantial for the first hop. Moreover, we find a clear scaling trend with increasing model size for the first hop of reasoning but not for the second hop. Our experimental findings suggest potential challenges and opportunities for future development and applications of LLMs.

I Modelli Linguistici di Grandi Dimensioni Eseguono Ragionamenti Multi-Passo in Modo Latente?

Do Large Language Models Latently Perform Multi-Hop Reasoning?

Abstract

Support