STREAM：一個以數據為中心的框架，用於從串流媒體中挖掘高價值的任務導向對話

摘要

大型語言模型在垂直領域中的發展，因缺乏複雜且具領域特定性的任務導向對話而遭遇瓶頸。現有的數據獲取流程面臨持續的三難困境：專家標註成本高昂、真實服務對話受限於隱私與商業限制，而靜態語料庫則迅速過時。我們提出 Stream，這是一個以數據為中心的框架，利用公開的串流媒體（直播與短影音）大規模合成高價值的服務對話。Stream 從嘈雜的串流中挖掘真實互動信號，並整合角色驅動的人物設定建構與對話藍圖建構來合成對話；進一步採用檢索增強生成（RAG）以支援具知識意識的回應。基於 Stream，我們釋出 StreamDial，這是一個大規模多領域資料集，涵蓋汽車、餐廳與旅館。StreamDial 總共包含 87,498 個對話會話與 1,497,320 個輪次，平均每個會話有 17.11 個輪次，且各領域規模相當。每個會話以結構化的四元組 ⟨P_u, P_a, B, H⟩ 組織，該四元組將對話歷史與明確的使用者/代理人角色設定及對話藍圖配對，捕捉真實服務行為，例如需求挖掘、約束衝突、協商與補救。透過自動評估者與下游任務的評估顯示，StreamDial 在內在對話品質上優於強基準模型，且使用 StreamDial 訓練的模型在不同主幹上改善了對話狀態追蹤；我們進一步報告了一套完成的人工評估集，以及在控制訓練預算下 Qwen3-8B 上令人鼓舞的多語言遷移表現。該資料集已於 https://github.com/hitxueliang/DialogDataSetBySTREAM 釋出。

English

Large language models for vertical domains are bottlenecked by the scarcity of complex, domain-specific task-oriented dialogues. Existing data acquisition pipelines face a persistent trilemma: expert annotation is expensive, real-world service conversations are constrained by privacy and commercial restrictions, and static corpora quickly become temporally stale. We propose Stream, a data-centric framework that leverages publicly available streaming media (live streams and short videos) to synthesize high-value service dialogues at scale. Stream mines authentic interaction signals from noisy streams and synthesizes conversations by integrating role-grounded persona construction with Conversational Blueprint construction; it further adopts retrieval-augmented generation (RAG) to support knowledge-aware responses. Based on Stream, we release StreamDial, a large-scale multi-domain dataset covering Automotive, Restaurant, and Hotel. StreamDial contains 87,498 dialogue sessions and 1,497,320 turns in total, with an average of 17.11 turns per session and a comparable scale across domains. Each session is organized as a structured quadruplet langle P_u, P_a, B, H rangle that pairs dialogue history with explicit user/agent personas and a Conversational Blueprint, capturing realistic service behaviors such as requirement mining, constraint conflicts, negotiation, and recovery. Evaluations with automatic judges and downstream tasks show that StreamDial improves intrinsic dialogue quality over strong baselines, and models trained with StreamDial improve Dialogue State Tracking across backbones; we further report a completed human-evaluation set and encouraging multilingual transfer on Qwen3-8B under a controlled training budget. The data is released in https://github.com/hitxueliang/DialogDataSetBySTREAM.