TingIS:於企業級規模下從雜訊客戶事件中實現即時風險事件發掘
TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale
April 23, 2026
作者: Jun Wang, Ziyin Zhang, Rui Wang, Hang Yu, Peng Di, Rui Wang
cs.AI
摘要
對大規模雲原生服務而言,即時檢測與緩解技術異常至關重要,即使僅數分鐘的中斷也可能導致巨額財務損失並削弱用戶信任。儘管客戶事件是發現監控遺漏風險的重要信號,但由於極端雜訊、高吞吐量及各業務線語義複雜性,從中提取可執行的情報仍具挑戰性。本文提出TingIS——專為企業級事件發現設計的端到端系統。其核心為多階段事件關聯引擎,該引擎將高效索引技術與大型語言模型(LLMs)協同整合,對事件合併做出智能決策,從而僅需少量多元用戶描述即可穩定提取可執行事件。該引擎輔以級聯路由機制實現精準業務歸因,並結合領域知識、統計模式與行為過濾的多維降噪流程。在生產環境中,TingIS每分鐘峰值吞吐量超過2,000條消息,每日處理30萬條消息,實現P90警報延遲3.5分鐘,高優先級事件發現率達95%。基於真實數據構建的基準測試表明,TingIS在路由準確性、聚類質量與信噪比方面顯著優於基準方法。
English
Real-time detection and mitigation of technical anomalies are critical for large-scale cloud-native services, where even minutes of downtime can result in massive financial losses and diminished user trust. While customer incidents serve as a vital signal for discovering risks missed by monitoring, extracting actionable intelligence from this data remains challenging due to extreme noise, high throughput, and semantic complexity of diverse business lines. In this paper, we present TingIS, an end-to-end system designed for enterprise-grade incident discovery. At the core of TingIS is a multi-stage event linking engine that synergizes efficient indexing techniques with Large Language Models (LLMs) to make informed decisions on event merging, enabling the stable extraction of actionable incidents from just a handful of diverse user descriptions. This engine is complemented by a cascaded routing mechanism for precise business attribution and a multi-dimensional noise reduction pipeline that integrates domain knowledge, statistical patterns, and behavioral filtering. Deployed in a production environment handling a peak throughput of over 2,000 messages per minute and 300,000 messages per day, TingIS achieves a P90 alert latency of 3.5 minutes and a 95\% discovery rate for high-priority incidents. Benchmarks constructed from real-world data demonstrate that TingIS significantly outperforms baseline methods in routing accuracy, clustering quality, and Signal-to-Noise Ratio.