Tradução Espacial de Fala: Traduzindo Através do Espaço com Dispositivos Auditivos Binaurais

Resumo

Imagine estar em um espaço lotado onde as pessoas falam um idioma diferente e ter dispositivos auditivos que transformam o ambiente sonoro em sua língua nativa, preservando as pistas espaciais de todos os falantes. Apresentamos a tradução de fala espacial, um conceito inovador para dispositivos auditivos que traduzem os falantes no ambiente do usuário, mantendo a direção e as características vocais únicas de cada falante na saída binaural. Para alcançar isso, enfrentamos vários desafios técnicos que abrangem separação cega de fontes, localização, tradução expressiva em tempo real e renderização binaural para preservar as direções dos falantes no áudio traduzido, enquanto alcançamos inferência em tempo real no chip Apple M2. Nossa avaliação de prova de conceito com um protótipo de fone de ouvido binaural mostra que, ao contrário dos modelos existentes, que falham na presença de interferência, alcançamos uma pontuação BLEU de até 22,01 ao traduzir entre idiomas, apesar da forte interferência de outros falantes no ambiente. Estudos com usuários confirmam ainda mais a eficácia do sistema na renderização espacial da fala traduzida em ambientes reverberantes do mundo real previamente não vistos. Dando um passo atrás, este trabalho marca o primeiro passo em direção à integração da percepção espacial na tradução de fala.

English

Imagine being in a crowded space where people speak a different language and having hearables that transform the auditory space into your native language, while preserving the spatial cues for all speakers. We introduce spatial speech translation, a novel concept for hearables that translate speakers in the wearer's environment, while maintaining the direction and unique voice characteristics of each speaker in the binaural output. To achieve this, we tackle several technical challenges spanning blind source separation, localization, real-time expressive translation, and binaural rendering to preserve the speaker directions in the translated audio, while achieving real-time inference on the Apple M2 silicon. Our proof-of-concept evaluation with a prototype binaural headset shows that, unlike existing models, which fail in the presence of interference, we achieve a BLEU score of up to 22.01 when translating between languages, despite strong interference from other speakers in the environment. User studies further confirm the system's effectiveness in spatially rendering the translated speech in previously unseen real-world reverberant environments. Taking a step back, this work marks the first step towards integrating spatial perception into speech translation.

Tradução Espacial de Fala: Traduzindo Através do Espaço com Dispositivos Auditivos Binaurais

Spatial Speech Translation: Translating Across Space With Binaural Hearables

Resumo

Support