ChatPaper.aiChatPaper

與AI對話:實時視頻通訊從人類到人工智慧的驚人轉變

Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI

July 14, 2025
作者: Jiangkai Wu, Zhiyuan Ren, Liming Liu, Xinggong Zhang
cs.AI

摘要

AI視訊聊天作為即時通訊(RTC)的新範式,其中一方並非人類,而是多模態大型語言模型(MLLM)。這使得人與AI之間的互動更加直觀,彷彿與真人面對面交談。然而,這對延遲提出了重大挑戰,因為MLLM推理佔據了大部分回應時間,留給視訊串流的時間極少。由於網路的不確定性和不穩定性,傳輸延遲成為阻礙AI表現得像真人的關鍵瓶頸。為此,我們提出了Artic,一個面向AI的即時通訊框架,探討從“人類觀看視訊”到“AI理解視訊”的網路需求轉變。為了在保持MLLM準確性的同時大幅降低位元率,我們提出了上下文感知視訊串流技術,該技術識別每個視訊區域對聊天的重要性,並幾乎將所有位元率分配給聊天重要的區域。為了避免封包重傳,我們提出了抗損失自適應幀率技術,利用前一幀來替代丟失或延遲的幀,同時避免位元率浪費。為了評估視訊串流品質對MLLM準確性的影響,我們建立了首個基準測試,名為降質視訊理解基準(DeViBench)。最後,我們討論了AI視訊聊天中的一些開放性問題及正在進行的解決方案。
English
AI Video Chat emerges as a new paradigm for Real-time Communication (RTC), where one peer is not a human, but a Multimodal Large Language Model (MLLM). This makes interaction between humans and AI more intuitive, as if chatting face-to-face with a real person. However, this poses significant challenges to latency, because the MLLM inference takes up most of the response time, leaving very little time for video streaming. Due to network uncertainty and instability, transmission latency becomes a critical bottleneck preventing AI from being like a real person. To address this, we propose Artic, an AI-oriented Real-time Communication framework, exploring the network requirement shift from "humans watching video" to "AI understanding video". To reduce bitrate dramatically while maintaining MLLM accuracy, we propose Context-Aware Video Streaming that recognizes the importance of each video region for chat and allocates bitrate almost exclusively to chat-important regions. To avoid packet retransmission, we propose Loss-Resilient Adaptive Frame Rate that leverages previous frames to substitute for lost/delayed frames while avoiding bitrate waste. To evaluate the impact of video streaming quality on MLLM accuracy, we build the first benchmark, named Degraded Video Understanding Benchmark (DeViBench). Finally, we discuss some open questions and ongoing solutions for AI Video Chat.
PDF42July 28, 2025