Mindstorms em Sociedades da Mente Baseadas em Linguagem Natural

Resumo

Tanto a "sociedade da mente" de Minsky quanto o "aprender a pensar" de Schmidhuber inspiram sociedades diversas de grandes redes neurais multimodais (NNs) que resolvem problemas ao se entrevistarem mutuamente em uma "tempestade mental". Implementações recentes de sociedades da mente baseadas em NNs consistem em grandes modelos de linguagem (LLMs) e outros especialistas baseados em NNs que se comunicam por meio de uma interface de linguagem natural. Ao fazer isso, elas superam as limitações de LLMs individuais, melhorando o raciocínio multimodal zero-shot. Nessas sociedades da mente baseadas em linguagem natural (NLSOMs), novos agentes — todos se comunicando através da mesma linguagem simbólica universal — são facilmente adicionados de forma modular. Para demonstrar o poder das NLSOMs, montamos e experimentamos com várias delas (com até 129 membros), aproveitando tempestades mentais nelas para resolver algumas tarefas práticas de IA: resposta a perguntas visuais, legendagem de imagens, síntese de texto para imagem, geração 3D, recuperação egocêntrica, IA incorporada e resolução geral de tarefas baseadas em linguagem. Vemos isso como um ponto de partida para NLSOMs muito maiores com bilhões de agentes — alguns dos quais podem ser humanos. E com o surgimento dessas grandes sociedades de mentes heterogêneas, muitas novas questões de pesquisa tornaram-se repentinamente fundamentais para o futuro da inteligência artificial. Qual deve ser a estrutura social de uma NLSOM? Quais seriam as (des)vantagens de ter uma estrutura monárquica em vez de democrática? Como os princípios das economias de NNs podem ser usados para maximizar a recompensa total de uma NLSOM de aprendizado por reforço? Neste trabalho, identificamos, discutimos e tentamos responder a algumas dessas questões.

English

Both Minsky's "society of mind" and Schmidhuber's "learning to think" inspire diverse societies of large multimodal neural networks (NNs) that solve problems by interviewing each other in a "mindstorm." Recent implementations of NN-based societies of minds consist of large language models (LLMs) and other NN-based experts communicating through a natural language interface. In doing so, they overcome the limitations of single LLMs, improving multimodal zero-shot reasoning. In these natural language-based societies of mind (NLSOMs), new agents -- all communicating through the same universal symbolic language -- are easily added in a modular fashion. To demonstrate the power of NLSOMs, we assemble and experiment with several of them (having up to 129 members), leveraging mindstorms in them to solve some practical AI tasks: visual question answering, image captioning, text-to-image synthesis, 3D generation, egocentric retrieval, embodied AI, and general language-based task solving. We view this as a starting point towards much larger NLSOMs with billions of agents-some of which may be humans. And with this emergence of great societies of heterogeneous minds, many new research questions have suddenly become paramount to the future of artificial intelligence. What should be the social structure of an NLSOM? What would be the (dis)advantages of having a monarchical rather than a democratic structure? How can principles of NN economies be used to maximize the total reward of a reinforcement learning NLSOM? In this work, we identify, discuss, and try to answer some of these questions.

Mindstorms em Sociedades da Mente Baseadas em Linguagem Natural

Mindstorms in Natural Language-Based Societies of Mind

Resumo

Support