多語言思考能否增強大型語言模型的推理能力？

摘要

先前的研究表明，大型語言模型存在顯著的「英語偏見」，即當任務以英語呈現時，它們通常表現更佳。有趣的是，我們觀察到在某些推理任務中使用其他特定語言，其表現甚至優於英語。然而，這一現象仍未被充分探索。本文中，我們探討了在多語言環境下進行推理任務的潛在上限，指出多語言推理相較於僅使用英語的推理，不僅能顯著提升（接近10個Acc@k點）性能上限，而且具有更強的穩健性（對翻譯質量和語言選擇的變化具有容忍度）。除了分析這一上限背後的原因及實現過程中的挑戰外，我們還發現，常見的答案選擇方法因其局限性和偏見，無法達到這一上限。這些洞見或將為未來研究鋪平道路，旨在充分挖掘大型語言模型中多語言推理的潛力。

English

Previous work indicates that large language models exhibit a significant "English bias", i.e. they often perform better when tasks are presented in English. Interestingly, we have observed that using certain other languages in reasoning tasks can yield better performance than English. However, this phenomenon remains under-explored. In this paper, we explore the upper bound of harnessing multilingualism in reasoning tasks, suggesting that multilingual reasoning promises significantly (by nearly 10 Acc@k points) and robustly (tolerance for variations in translation quality and language choice) higher upper bounds than English-only reasoning. Besides analyzing the reason behind the upper bound and challenges in reaching it, we also find that common answer selection methods cannot achieve this upper bound, due to their limitations and biases. These insights could pave the way for future research aimed at fully harnessing the potential of multilingual reasoning in LLMs.

多語言思考能否增強大型語言模型的推理能力？

Could Thinking Multilingually Empower LLM Reasoning?

摘要

Support