TingIS:企业级环境下从嘈杂客户事件中实现实时风险事件发现
TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale
April 23, 2026
作者: Jun Wang, Ziyin Zhang, Rui Wang, Hang Yu, Peng Di, Rui Wang
cs.AI
摘要
实时检测与消除技术异常对大规模云原生服务至关重要,数分钟的中断就可能导致巨额财务损失和用户信任度下降。虽然客户事件是发现监控盲区风险的重要信号,但由于极端噪声、高吞吐量以及多业务线语义复杂性,从这些数据中提取可操作情报仍具挑战。本文提出TingIS——面向企业级事件发现的端到端系统,其核心是多阶段事件关联引擎,该引擎将高效索引技术与大语言模型(LLMs)协同工作,基于少量多样化用户描述即可做出事件合并的智能决策,从而实现可操作事件的稳定提取。该系统还包含级联路由机制实现精准业务归因,以及融合领域知识、统计模式和行为过滤的多维降噪流水线。在生产环境中,TingIS峰值吞吐量达每分钟2000条消息、日处理30万条消息,P90告警延迟为3.5分钟,对高优先级事件的发现率达95%。基于真实场景构建的基准测试表明,TingIS在路由精度、聚类质量与信噪比方面显著优于基线方法。
English
Real-time detection and mitigation of technical anomalies are critical for large-scale cloud-native services, where even minutes of downtime can result in massive financial losses and diminished user trust. While customer incidents serve as a vital signal for discovering risks missed by monitoring, extracting actionable intelligence from this data remains challenging due to extreme noise, high throughput, and semantic complexity of diverse business lines. In this paper, we present TingIS, an end-to-end system designed for enterprise-grade incident discovery. At the core of TingIS is a multi-stage event linking engine that synergizes efficient indexing techniques with Large Language Models (LLMs) to make informed decisions on event merging, enabling the stable extraction of actionable incidents from just a handful of diverse user descriptions. This engine is complemented by a cascaded routing mechanism for precise business attribution and a multi-dimensional noise reduction pipeline that integrates domain knowledge, statistical patterns, and behavioral filtering. Deployed in a production environment handling a peak throughput of over 2,000 messages per minute and 300,000 messages per day, TingIS achieves a P90 alert latency of 3.5 minutes and a 95\% discovery rate for high-priority incidents. Benchmarks constructed from real-world data demonstrate that TingIS significantly outperforms baseline methods in routing accuracy, clustering quality, and Signal-to-Noise Ratio.