HI-TransPA:听力障碍翻译个人助手
HI-TransPA: Hearing Impairments Translation Personal Assistant
November 13, 2025
作者: Zhiming Ma, Shiyu Gan, Junhao Zhao, Xianming Li, Qingyun Pan, Peidong Wang, Mingjun Pan, Yuhao Mo, Jiajie Cheng, Chengxin Chen, Zhonglun Cao, Chonghan Liu, Shi Cheng
cs.AI
摘要
为给听障人士的日常交流提供统一灵活的解决方案,我们将全模态范式引入辅助技术领域,推出了指令驱动的视听个人助手HI-TransPA。该模型通过融合模糊语音与高帧率唇部动态,在统一的多模态框架内实现翻译与对话功能。针对原始数据存在噪声异质性、现有全模态模型对听障语音适应性不足的挑战,我们构建了完整的预处理与筛选流程:通过检测面部关键点、分离并稳定唇部区域、量化评估多模态样本质量,形成质量评分体系指导课程学习——先训练清洁高置信度样本,逐步引入困难样本以增强模型鲁棒性。我们进一步采用SigLIP编码器与统一3D重采样器相结合的方法,高效编码高帧率唇部运动。在自建HI-Dialogue数据集上的实验表明,HI-TransPA在字面准确度与语义保真度方面均达到最先进水平。本研究为全模态模型在辅助通信技术中的应用奠定了基础,为未来研究提供了端到端建模框架与核心处理工具。
English
To provide a unified and flexible solution for daily communication among hearing-impaired individuals, we introduce the Omni-Model paradigm into assistive technology and present HI-TransPA, an instruction-driven audio-visual personal assistant. The model fuses indistinct speech with high-frame-rate lip dynamics, enabling both translation and dialogue within a single multimodal framework. To tackle the challenges of noisy and heterogeneous raw data and the limited adaptability of existing Omni-Models to hearing-impaired speech, we construct a comprehensive preprocessing and curation pipeline that detects facial landmarks, isolates and stabilizes the lip region, and quantitatively assesses multimodal sample quality. These quality scores guide a curriculum learning strategy that first trains on clean, high-confidence samples and progressively incorporates harder cases to strengthen model robustness. We further adopt a SigLIP encoder combined with a Unified 3D-Resampler to efficiently encode high-frame-rate lip motion. Experiments on our purpose-built HI-Dialogue dataset show that HI-TransPA achieves state-of-the-art performance in both literal accuracy and semantic fidelity. This work establishes a foundation for applying Omni-Models to assistive communication technology, providing an end-to-end modeling framework and essential processing tools for future research.