ターンベースのゲームを超えて：Duplexモデルによるリアルタイム会話の実現

要旨

大規模言語モデル（LLM）が日常生活に浸透するにつれ、人間の会話を模倣したリアルタイムなインタラクションに対する需要が高まっています。従来のLLM駆動のターンベースチャットシステムでは、システムが応答を生成している間、ユーザーが口頭でシステムとやり取りすることができません。この制限を克服するため、既存のLLMを双方向モデルに適応させ、これらのLLMが出力を生成しながらユーザーの発話を聞き取り、動的に調整して即時のフィードバックを提供できるようにします。具体的には、会話のクエリと応答を複数のタイムスライスに分割し、時分割多重化（TDM）エンコーディング・デコーディング戦略を採用して、これらのスライスを擬似的に同時処理します。さらに、LLMがリアルタイム会話を処理できるようにするため、クエリと応答の交互タイムスライスや、瞬間的なインタラクションにおける典型的なフィードバックタイプをカバーしたファインチューニング用データセットを構築しました。実験結果から、会話のクエリと応答が不完全なスライスに分割されて処理される場合でも、LLMは当データセットでのわずかなファインチューニングステップで、標準ベンチマークにおける元の性能を維持できることが示されています。自動評価と人間による評価の結果、双方向モデルはユーザーとAIのインタラクションをより自然で人間らしくし、従来のLLMと比較してユーザー満足度を大幅に向上させることが明らかになりました。我々の双方向モデルとデータセットは公開予定です。

English

As large language models (LLMs) increasingly permeate daily lives, there is a growing demand for real-time interactions that mirror human conversations. Traditional turn-based chat systems driven by LLMs prevent users from verbally interacting with the system while it is generating responses. To overcome these limitations, we adapt existing LLMs to duplex models so that these LLMs can listen for users while generating output and dynamically adjust themselves to provide users with instant feedback. % such as in response to interruptions. Specifically, we divide the queries and responses of conversations into several time slices and then adopt a time-division-multiplexing (TDM) encoding-decoding strategy to pseudo-simultaneously process these slices. Furthermore, to make LLMs proficient enough to handle real-time conversations, we build a fine-tuning dataset consisting of alternating time slices of queries and responses as well as covering typical feedback types in instantaneous interactions. Our experiments show that although the queries and responses of conversations are segmented into incomplete slices for processing, LLMs can preserve their original performance on standard benchmarks with a few fine-tuning steps on our dataset. Automatic and human evaluation indicate that duplex models make user-AI interactions more natural and human-like, and greatly improve user satisfaction compared to vanilla LLMs. Our duplex model and dataset will be released.

ターンベースのゲームを超えて：Duplexモデルによるリアルタイム会話の実現

Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models

要旨

Support