语音识别中的凸低资源口音鲁棒语言检测

摘要

全球化和多元文化持续催生出日益多样的语音变体。然而，当前的口语对话系统在处理代表性不足的方言和口音时频繁失败，常常误判输入语言，并在下游对话任务中引发级联故障。在低资源限制下解决这种方言差异仍是一个开放性挑战，因为标准的微调方法计算成本高昂，且在高维语音数据上容易过拟合。我们提出凸语言检测（CLD），一种新颖的框架，将理论坚实的凸优化技术集成到口语对话系统管道中。我们的方法通过JAX中基于多GPU的交替方向乘子法（ADMM）高效实现，从而提供全局最优性保证和多项式时间内的快速训练。理论上，我们证明了所提出的凸目标函数能够诱导出认证的间隔稳定性，并提供了针对特征扰动的保证。实验上，我们展示了样本效率和对输入方言变化的鲁棒性，在具有挑战性的低资源场景下达到了97–98%的准确率。我们的开源软件包可通过https://pypi.org/project/jaxcld/获取。

English

Globalization and multiculturalism continue to produce increasingly diverse speech varieties. Yet current spoken dialogue systems frequently fail on under-represented dialects and accents, often misidentifying the input language and causing cascading failures in downstream dialogue tasks. Addressing this dialectal variance under low-resource constraints remains an open challenge, as standard fine-tuning is computationally expensive and prone to overfitting on high-dimensional speech data. We propose Convex Language Detection (CLD), a novel framework that integrates theoretically grounded convex optimization techniques into the spoken dialogue systems pipeline. Our method is efficiently implemented via multi-GPU Alternating Direction Method of Multipliers (ADMM) in JAX, thus providing global optimality guarantees and fast training in polynomial time. Theoretically, we prove that our convex objective induces certified margin stability and provide guarantees against feature perturbations. Empirically, we demonstrate sample efficiency and robustness to input dialectical variation, achieving 97-98% accuracy in challenging low-resource regimes. Our open-source package is available at https://pypi.org/project/jaxcld/