턴제 게임의 한계를 넘어: 듀플렉스 모델을 통한 실시간 대화 구현

초록

대규모 언어 모델(LLM)이 일상생활에 점점 더 깊이 스며들면서, 인간 대화를 반영하는 실시간 상호작용에 대한 요구가 증가하고 있습니다. LLM 기반의 전통적인 턴 기반 채팅 시스템은 모델이 응답을 생성하는 동안 사용자가 시스템과 구두로 상호작용하는 것을 방해합니다. 이러한 한계를 극복하기 위해, 우리는 기존 LLM을 듀플렉스 모델로 적응시켜, 이러한 LLM이 출력을 생성하면서도 사용자의 말을 들을 수 있고, 사용자에게 즉각적인 피드백을 제공하기 위해 동적으로 조정할 수 있도록 했습니다. 특히, 우리는 대화의 질문과 응답을 여러 시간 조각으로 나누고, 시간 분할 다중화(TDM) 인코딩-디코딩 전략을 채택하여 이러한 조각을 가상으로 동시에 처리합니다. 더 나아가, LLM이 실시간 대화를 처리할 수 있을 만큼 숙련되도록 하기 위해, 질문과 응답의 교대 시간 조각과 즉각적인 상호작용에서의 전형적인 피드백 유형을 포함한 미세 조정 데이터셋을 구축했습니다. 우리의 실험 결과, 대화의 질문과 응답이 불완전한 조각으로 분할되어 처리되더라도, LLM은 우리의 데이터셋에 대한 몇 차례의 미세 조정을 통해 표준 벤치마크에서 원래의 성능을 유지할 수 있음을 보여줍니다. 자동 및 인간 평가는 듀플렉스 모델이 사용자-AI 상호작용을 더 자연스럽고 인간적으로 만들며, 기존 LLM에 비해 사용자 만족도를 크게 향상시킨다는 것을 나타냅니다. 우리의 듀플렉스 모델과 데이터셋은 공개될 예정입니다.

English

As large language models (LLMs) increasingly permeate daily lives, there is a growing demand for real-time interactions that mirror human conversations. Traditional turn-based chat systems driven by LLMs prevent users from verbally interacting with the system while it is generating responses. To overcome these limitations, we adapt existing LLMs to duplex models so that these LLMs can listen for users while generating output and dynamically adjust themselves to provide users with instant feedback. % such as in response to interruptions. Specifically, we divide the queries and responses of conversations into several time slices and then adopt a time-division-multiplexing (TDM) encoding-decoding strategy to pseudo-simultaneously process these slices. Furthermore, to make LLMs proficient enough to handle real-time conversations, we build a fine-tuning dataset consisting of alternating time slices of queries and responses as well as covering typical feedback types in instantaneous interactions. Our experiments show that although the queries and responses of conversations are segmented into incomplete slices for processing, LLMs can preserve their original performance on standard benchmarks with a few fine-tuning steps on our dataset. Automatic and human evaluation indicate that duplex models make user-AI interactions more natural and human-like, and greatly improve user satisfaction compared to vanilla LLMs. Our duplex model and dataset will be released.

턴제 게임의 한계를 넘어: 듀플렉스 모델을 통한 실시간 대화 구현

Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models

초록

Support