ChatPaper.aiChatPaper

分段长度影响研究:音频指纹识别性能与分段时长的关联性分析

Segment Length Matters: A Study of Segment Lengths on Audio Fingerprinting Performance

January 25, 2026
作者: Ziling Gong, Yunyan Ouyang, Iram Kamdar, Melody Ma, Hongjie Chen, Franck Dernoncourt, Ryan A. Rossi, Nesreen K. Ahmed
cs.AI

摘要

音频指纹技术能够为声学信号生成可识别的表征,该表征后续可用于身份识别与检索系统。为获得区分性表征,输入音频通常被分割为较短时段,以便提取和分析局部声学特征。现代神经网络方法通常处理短时固定长度的音频片段,但片段时长的选择往往基于经验判断,鲜有深入研究。本文系统探究了片段长度对音频指纹性能的影响,通过扩展现有神经指纹架构以适配不同时段长度,并评估了不同片段长度与查询时长下的检索精度。实验结果表明,较短片段长度(0.5秒)通常能获得更优性能。此外,我们评估了大语言模型在推荐最优片段长度方面的能力,发现在三种测试模型中,GPT-5-mini在五项评估维度上均能给出最佳建议。本研究为大规模神经音频检索系统中片段时长的选择提供了实践指导。
English
Audio fingerprinting provides an identifiable representation of acoustic signals, which can be later used for identification and retrieval systems. To obtain a discriminative representation, the input audio is usually segmented into shorter time intervals, allowing local acoustic features to be extracted and analyzed. Modern neural approaches typically operate on short, fixed-duration audio segments, yet the choice of segment duration is often made heuristically and rarely examined in depth. In this paper, we study how segment length affects audio fingerprinting performance. We extend an existing neural fingerprinting architecture to adopt various segment lengths and evaluate retrieval accuracy across different segment lengths and query durations. Our results show that short segment lengths (0.5-second) generally achieve better performance. Moreover, we evaluate LLM capacity in recommending the best segment length, which shows that GPT-5-mini consistently gives the best suggestions across five considerations among three studied LLMs. Our findings provide practical guidance for selecting segment duration in large-scale neural audio retrieval systems.
PDF12January 31, 2026