マルチストリームLLM：並列的な思考・入力・出力ストリームによる言語モデルの解放

要旨

言語モデルの継続的な性能向上により、自律エージェントの駆動体としての広範な利用が可能となった。例えば、コーディングやコンピュータ操作アプリケーションなどである。しかしながら、これらのシステムの核心は、ChatGPTのような初期の指示チューニングモデル以来、大きく変化していない。高度なAIエージェントでさえ、メッセージ交換形式で動作し、ユーザー、システム、自身（すなわち思考連鎖）、ツールとの間で逐次的にメッセージを交換し、単一の計算ストリームの中で処理を行う。チャットモデルにおけるこの単一ストリームへのボトルネックは、いくつかの制限を引き起こす。すなわち、エージェントは読み取り中に行動（出力生成）ができず、逆に書き込み中に新しい情報に反応できない。同様に、思考中には行動できず、情報の読み取りやそれに基づく行動中には思考できない。本研究では、逐次的なメッセージ形式のための指示チューニングから、複数の並列ストリームのための指示チューニングへと切り替え、各役割を別々のストリームに分割することで、モデルが解放可能であることを示す。言語モデルのすべてのフォワードパスは、複数の入力ストリームから同時に読み取り、複数の出力ストリームでトークンを生成し、これらはすべて以前のタイムステップに因果的に依存する。このデータ駆動型の変更により、上記のような多くのユーザビリティの制限が改善され、並列化によるモデル効率の向上、関心事の分離の改善によるモデルセキュリティの向上、さらにはモデルの監視可能性の向上がもたらされると論じる。

English

The continued improvements in language model capability have unlocked their widespread use as drivers of autonomous agents, for example in coding or computer use applications. However, the core of these systems has not changed much since early instruction-tuned models like ChatGPT. Even advanced AI agents function on message exchange formats, successively exchanging messages with users, systems, with itself (i.e. chain-of-thought) and tools in a single stream of computation. This bottleneck to a single stream in chat models leads to a number of limitations: the agent cannot act (generate output) while reading, and in reverse, cannot react to new information while writing. Similarly, the agent cannot act while thinking and cannot think while reading or acting on information. In this work, we show that models can be unblocked by switching from instruction-tuning for sequential message formats to instruction-tuning for multiple, parallel streams of computation, splitting each role into a separate stream. Every forward pass of the language model then simultaneously reads from multiple input streams and generates tokens in multiple output streams, all of which causally depend on earlier timesteps. We argue that this data-driven change remedies a number of usability limitations as outlined above, improves model efficiency through parallelization, improves model security through better separation of concerns and can further improve model monitorability.

マルチストリームLLM：並列的な思考・入力・出力ストリームによる言語モデルの解放

Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs

要旨

Support