WildVis：百萬級野外聊天記錄開源視覺化工具

摘要

現實世界對話資料的日益普及，為研究人員提供了研究用戶與聊天機器人互動的絕佳機會。然而，這類資料的龐大規模使得手動檢視單一對話變得不可行。為克服此挑戰，我們推出 WildVis——一款支援快速、多面向大規模對話分析的互動式工具。WildVis 能根據多種條件，在文本空間與嵌入空間中提供搜尋與視覺化功能。為處理百萬級規模的資料集，我們實施了多項優化技術，包括建構搜尋索引、預計算與壓縮嵌入向量，以及快取機制，以確保使用者在數秒內獲得流暢的互動體驗。透過三項案例研究，我們驗證了 WildVis 的實用性：協助不當使用聊天機器人之研究、視覺化並比較不同資料集的主題分佈，以及分析用戶特有的對話模式。WildVis 為開源工具且具可擴充性，能支援額外資料集及客製化的搜尋與視覺化功能。

English

The increasing availability of real-world conversation data offers exciting opportunities for researchers to study user-chatbot interactions. However, the sheer volume of this data makes manually examining individual conversations impractical. To overcome this challenge, we introduce WildVis, an interactive tool that enables fast, versatile, and large-scale conversation analysis. WildVis provides search and visualization capabilities in the text and embedding spaces based on a list of criteria. To manage million-scale datasets, we implemented optimizations including search index construction, embedding precomputation and compression, and caching to ensure responsive user interactions within seconds. We demonstrate WildVis's utility through three case studies: facilitating chatbot misuse research, visualizing and comparing topic distributions across datasets, and characterizing user-specific conversation patterns. WildVis is open-source and designed to be extendable, supporting additional datasets and customized search and visualization functionalities.