SQuTR: Een robuustheidscriterium voor gesproken-zoekopdracht-naar-tekstretrieval bij akoestische ruis

Samenvatting

Gesproken query retrieval is een belangrijke interactiemodus in moderne informatie-retrieval. Bestaande evaluatiedatasets zijn echter vaak beperkt tot eenvoudige queries onder gecontroleerde ruisomstandigheden, waardoor ze ongeschikt zijn voor het beoordelen van de robuustheid van gesproken query retrieval-systemen onder complexe akoestische verstoringen. Om deze beperking aan te pakken, presenteren we SQuTR, een robuustheidsbenchmark voor gesproken query retrieval die een grootschalige dataset en een uniform evaluatieprotocol omvat. SQuTR verzamelt 37.317 unieke queries uit zes veelgebruikte Engelse en Chinese tekstretrieval-datasets, verspreid over meerdere domeinen en uiteenlopende querytypen. We synthetiseren spraak met stemprofielen van 200 echte sprekers en voegen 17 categorieën realistische omgevingsgeluiden toe onder gecontroleerde SNR-niveaus, wat reproduceerbare robuustheidsevaluatie mogelijk maakt van stille tot extreem rumoerige omstandigheden. Volgens het uniforme protocol voeren we grootschalige evaluaties uit op representatieve gecascadeerde en end-to-end retrievalsystemen. Experimentele resultaten tonen aan dat de retrievalprestaties afnemen naarmate de ruis toeneemt, met aanzienlijk verschillende dalingen tussen systemen. Zelfs grootschalige retrievalmodellen presteren slecht onder extreme ruis, wat aangeeft dat robuustheid een kritieke bottleneck blijft. Over het geheel genomen biedt SQuTR een reproduceerbare testomgeving voor benchmarking en diagnostische analyse, en faciliteert het toekomstig onderzoek naar robuustheid in gesproken query naar tekst retrieval.

English

Spoken query retrieval is an important interaction mode in modern information retrieval. However, existing evaluation datasets are often limited to simple queries under constrained noise conditions, making them inadequate for assessing the robustness of spoken query retrieval systems under complex acoustic perturbations. To address this limitation, we present SQuTR, a robustness benchmark for spoken query retrieval that includes a large-scale dataset and a unified evaluation protocol. SQuTR aggregates 37,317 unique queries from six commonly used English and Chinese text retrieval datasets, spanning multiple domains and diverse query types. We synthesize speech using voice profiles from 200 real speakers and mix 17 categories of real-world environmental noise under controlled SNR levels, enabling reproducible robustness evaluation from quiet to highly noisy conditions. Under the unified protocol, we conduct large-scale evaluations on representative cascaded and end-to-end retrieval systems. Experimental results show that retrieval performance decreases as noise increases, with substantially different drops across systems. Even large-scale retrieval models struggle under extreme noise, indicating that robustness remains a critical bottleneck. Overall, SQuTR provides a reproducible testbed for benchmarking and diagnostic analysis, and facilitates future research on robustness in spoken query to text retrieval.

SQuTR: Een robuustheidscriterium voor gesproken-zoekopdracht-naar-tekstretrieval bij akoestische ruis

SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise

Samenvatting

Support