자연 언어 기반 마음의 사회에서의 마인드스톰

초록

민스키의 "마음의 사회"와 슈미드후버의 "생각하는 법 배우기"는 다양한 대규모 다중모드 신경망(NN) 사회를 고무시켰으며, 이러한 신경망들은 "마인드스톰" 속에서 서로 인터뷰하며 문제를 해결합니다. 최근 구현된 NN 기반 마음의 사회는 대형 언어 모델(LLM)과 다른 NN 기반 전문가들이 자연어 인터페이스를 통해 소통하는 형태로 구성됩니다. 이를 통해 단일 LLM의 한계를 극복하고, 다중모드 제로샷 추론을 개선합니다. 이러한 자연어 기반 마음의 사회(NLSOM)에서는 새로운 에이전트들이 동일한 보편적 상징 언어를 통해 소통하며 모듈 방식으로 쉽게 추가될 수 있습니다. NLSOM의 힘을 입증하기 위해, 우리는 최대 129명의 멤버로 구성된 여러 NLSOM을 조립하고 실험하며, 마인드스톰을 활용하여 시각적 질문 응답, 이미지 캡션 생성, 텍스트-이미지 합성, 3D 생성, 자기 중심적 검색, 구체화된 AI, 그리고 일반 언어 기반 작업 해결과 같은 실용적인 AI 작업을 해결합니다. 우리는 이를 수십억 개의 에이전트(일부는 인간일 수도 있음)로 구성된 훨씬 더 큰 NLSOM으로 나아가는 출발점으로 봅니다. 그리고 이러한 이질적인 마음들로 구성된 거대한 사회의 출현과 함께, 인공지능의 미래에 있어 많은 새로운 연구 질문들이 갑자기 중요해졌습니다. NLSOM의 사회 구조는 어떻게 되어야 할까요? 민주적 구조보다 군주적 구조를 갖는 것이 (불)리한 점은 무엇일까요? 강화 학습 NLSOM의 총 보상을 극대화하기 위해 NN 경제학의 원칙을 어떻게 활용할 수 있을까요? 이 연구에서 우리는 이러한 질문들을 식별하고 논의하며, 일부에 대한 답을 시도합니다.

English

Both Minsky's "society of mind" and Schmidhuber's "learning to think" inspire diverse societies of large multimodal neural networks (NNs) that solve problems by interviewing each other in a "mindstorm." Recent implementations of NN-based societies of minds consist of large language models (LLMs) and other NN-based experts communicating through a natural language interface. In doing so, they overcome the limitations of single LLMs, improving multimodal zero-shot reasoning. In these natural language-based societies of mind (NLSOMs), new agents -- all communicating through the same universal symbolic language -- are easily added in a modular fashion. To demonstrate the power of NLSOMs, we assemble and experiment with several of them (having up to 129 members), leveraging mindstorms in them to solve some practical AI tasks: visual question answering, image captioning, text-to-image synthesis, 3D generation, egocentric retrieval, embodied AI, and general language-based task solving. We view this as a starting point towards much larger NLSOMs with billions of agents-some of which may be humans. And with this emergence of great societies of heterogeneous minds, many new research questions have suddenly become paramount to the future of artificial intelligence. What should be the social structure of an NLSOM? What would be the (dis)advantages of having a monarchical rather than a democratic structure? How can principles of NN economies be used to maximize the total reward of a reinforcement learning NLSOM? In this work, we identify, discuss, and try to answer some of these questions.

자연 언어 기반 마음의 사회에서의 마인드스톰

Mindstorms in Natural Language-Based Societies of Mind

초록

Support