ChatPaper.aiChatPaper

分段长度的重要性:分段长度对音频指纹识别性能影响的研究

Segment Length Matters: A Study of Segment Lengths on Audio Fingerprinting Performance

January 25, 2026
作者: Ziling Gong, Yunyan Ouyang, Iram Kamdar, Melody Ma, Hongjie Chen, Franck Dernoncourt, Ryan A. Rossi, Nesreen K. Ahmed
cs.AI

摘要

音訊指紋技術能為聲學信號生成可識別表徵,後續可用於身份識別與檢索系統。為獲取具區分度的表徵,輸入音訊通常會被分割為較短時長區間,以便提取並分析局部聲學特徵。現代神經網路方法通常處理短時固定長度音訊片段,但片段時長的選擇往往基於經驗法則,鮮少深入探討。本文研究片段長度對音訊指紋效能的影響,擴展現有神經指紋架構以適應不同時長片段,並評估不同片段長度與查詢時長下的檢索準確率。結果表明,短時長片段(0.5秒)通常能實現更佳效能。此外,我們評估了大型語言模型在推薦最佳片段長度方面的能力,發現GPT-5-mini在三種測試模型中以五項評估維度持續給出最佳建議。本研究為大規模神經音訊檢索系統中片段時長的選擇提供了實用指引。
English
Audio fingerprinting provides an identifiable representation of acoustic signals, which can be later used for identification and retrieval systems. To obtain a discriminative representation, the input audio is usually segmented into shorter time intervals, allowing local acoustic features to be extracted and analyzed. Modern neural approaches typically operate on short, fixed-duration audio segments, yet the choice of segment duration is often made heuristically and rarely examined in depth. In this paper, we study how segment length affects audio fingerprinting performance. We extend an existing neural fingerprinting architecture to adopt various segment lengths and evaluate retrieval accuracy across different segment lengths and query durations. Our results show that short segment lengths (0.5-second) generally achieve better performance. Moreover, we evaluate LLM capacity in recommending the best segment length, which shows that GPT-5-mini consistently gives the best suggestions across five considerations among three studied LLMs. Our findings provide practical guidance for selecting segment duration in large-scale neural audio retrieval systems.
PDF12January 31, 2026