多語言思考能否增強大型語言模型的推理能力?
Could Thinking Multilingually Empower LLM Reasoning?
April 16, 2025
作者: Changjiang Gao, Xu Huang, Wenhao Zhu, Shujian Huang, Lei Li, Fei Yuan
cs.AI
摘要
先前的研究表明,大型語言模型存在顯著的「英語偏見」,即當任務以英語呈現時,它們通常表現更佳。有趣的是,我們觀察到在某些推理任務中使用其他特定語言,其表現甚至優於英語。然而,這一現象仍未被充分探索。本文中,我們探討了在多語言環境下進行推理任務的潛在上限,指出多語言推理相較於僅使用英語的推理,不僅能顯著提升(接近10個Acc@k點)性能上限,而且具有更強的穩健性(對翻譯質量和語言選擇的變化具有容忍度)。除了分析這一上限背後的原因及實現過程中的挑戰外,我們還發現,常見的答案選擇方法因其局限性和偏見,無法達到這一上限。這些洞見或將為未來研究鋪平道路,旨在充分挖掘大型語言模型中多語言推理的潛力。
English
Previous work indicates that large language models exhibit a significant
"English bias", i.e. they often perform better when tasks are presented in
English. Interestingly, we have observed that using certain other languages in
reasoning tasks can yield better performance than English. However, this
phenomenon remains under-explored. In this paper, we explore the upper bound of
harnessing multilingualism in reasoning tasks, suggesting that multilingual
reasoning promises significantly (by nearly 10 Acc@k points) and robustly
(tolerance for variations in translation quality and language choice) higher
upper bounds than English-only reasoning. Besides analyzing the reason behind
the upper bound and challenges in reaching it, we also find that common answer
selection methods cannot achieve this upper bound, due to their limitations and
biases. These insights could pave the way for future research aimed at fully
harnessing the potential of multilingual reasoning in LLMs.