從黑箱到透明:運用可解釋人工智慧提升大學課堂中的自動口譯評估
From Black Box to Transparency: Enhancing Automated Interpreting Assessment with Explainable AI in College Classrooms
August 14, 2025
作者: Zhaokun Jiang, Ziyin Zhang
cs.AI
摘要
機器學習的最新進展激發了人們對自動化口譯品質評估日益增長的興趣。然而,現有研究存在語言使用品質檢驗不足、因數據稀缺和不平衡導致建模效果不理想,以及缺乏解釋模型預測的努力等問題。為解決這些不足,我們提出了一個多維建模框架,該框架整合了特徵工程、數據增強和可解釋機器學習。此方法優先考慮可解釋性而非「黑箱」預測,僅使用與構建相關的透明特徵,並進行Shapley值(SHAP)分析。我們的結果在一個新穎的英漢交替傳譯數據集上展示了強大的預測性能,識別出BLEURT和CometKiwi分數為忠實度的最強預測特徵,停頓相關特徵為流利度的關鍵指標,以及中文特有的短語多樣性指標對語言使用的重要性。總體而言,通過特別強調可解釋性,我們提供了一種可擴展、可靠且透明的替代傳統人工評估的方法,促進為學習者提供詳細的診斷反饋,並支持自動化分數單獨無法提供的自我調節學習優勢。
English
Recent advancements in machine learning have spurred growing interests in
automated interpreting quality assessment. Nevertheless, existing research
suffers from insufficient examination of language use quality, unsatisfactory
modeling effectiveness due to data scarcity and imbalance, and a lack of
efforts to explain model predictions. To address these gaps, we propose a
multi-dimensional modeling framework that integrates feature engineering, data
augmentation, and explainable machine learning. This approach prioritizes
explainability over ``black box'' predictions by utilizing only
construct-relevant, transparent features and conducting Shapley Value (SHAP)
analysis. Our results demonstrate strong predictive performance on a novel
English-Chinese consecutive interpreting dataset, identifying BLEURT and
CometKiwi scores to be the strongest predictive features for fidelity,
pause-related features for fluency, and Chinese-specific phraseological
diversity metrics for language use. Overall, by placing particular emphasis on
explainability, we present a scalable, reliable, and transparent alternative to
traditional human evaluation, facilitating the provision of detailed diagnostic
feedback for learners and supporting self-regulated learning advantages not
afforded by automated scores in isolation.