多流大型語言模型：以思維、輸入與輸出之平行流解鎖語言模型

摘要

語言模型能力的持續進步，使其作為自主代理（例如在程式編寫或電腦使用應用中）的驅動核心而廣泛應用。然而，自早期像ChatGPT這類指令微調模型以來，這些系統的核心機制並未發生根本性改變。即便是先進的AI代理，仍依賴訊息交換格式運作——在使用者、系統、自身（即思維鏈）與工具之間，以單一串流計算方式連續交換訊息。這種聊天模型中單一串流的瓶頸導致諸多限制：代理無法在讀取時產生輸出（即行動），反之亦然，無法在寫入時回應新資訊。同樣地，代理無法在思考時行動，也無法在讀取或處理資訊時思考。在本研究中，我們展示可透過將指令微調從序列化訊息格式轉換為多個並行計算串流，並將每個角色拆分至獨立串流，進而解除此限制。語言模型的每次前向傳播，將同時從多個輸入串流讀取資料，並在多個輸出串流生成符元，而這些輸出因果依賴於更早的時間步。我們主張，此以資料驅動的變革能解決前述多項可用性限制，透過並行化提升模型效率，藉由職責分離改善模型安全性，並進一步強化模型的可監控性。

English

The continued improvements in language model capability have unlocked their widespread use as drivers of autonomous agents, for example in coding or computer use applications. However, the core of these systems has not changed much since early instruction-tuned models like ChatGPT. Even advanced AI agents function on message exchange formats, successively exchanging messages with users, systems, with itself (i.e. chain-of-thought) and tools in a single stream of computation. This bottleneck to a single stream in chat models leads to a number of limitations: the agent cannot act (generate output) while reading, and in reverse, cannot react to new information while writing. Similarly, the agent cannot act while thinking and cannot think while reading or acting on information. In this work, we show that models can be unblocked by switching from instruction-tuning for sequential message formats to instruction-tuning for multiple, parallel streams of computation, splitting each role into a separate stream. Every forward pass of the language model then simultaneously reads from multiple input streams and generates tokens in multiple output streams, all of which causally depend on earlier timesteps. We argue that this data-driven change remedies a number of usability limitations as outlined above, improves model efficiency through parallelization, improves model security through better separation of concerns and can further improve model monitorability.

多流大型語言模型：以思維、輸入與輸出之平行流解鎖語言模型

Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs

摘要

Support