ChatPaper.aiChatPaper

迷失在文化轉譯中:大型語言模型是否在跨文化情境下的數學表現上遇到困難?

Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?

March 23, 2025
作者: Aabid Karim, Abdul Karim, Bhoomika Lohana, Matt Keon, Jaswinder Singh, Abdul Sattar
cs.AI

摘要

大型語言模型(LLMs)在多個領域取得了顯著進展,尤其是在編程、數學推理和邏輯問題解決方面。然而,一個關鍵問題仍然存在:當LLMs面對文化適應的數學問題時,這些數學推理能力是否依然有效?具體而言,當LLMs遇到嵌入在主流網絡規模AI訓練數據中無顯著代表性的文化背景中的數學問題時,其表現如何?為探討這一問題,我們從GSM8K(一個廣泛用於評估LLMs數學推理能力的基準數據集)中生成了六個合成文化數據集。在保持原始GSM8K測試集的數學邏輯和數值的同時,我們修改了文化元素,如人名、食物名稱、地名等。這些文化適應的數據集為評估LLMs在變化的文化背景下的數學推理能力提供了更可靠的框架。我們的研究發現,當文化參考發生變化時,即使底層的數學結構保持不變,LLMs在解決數學問題時仍面臨困難。較小的模型相比較大的模型表現出更大的性能下降。有趣的是,我們的結果還表明,文化熟悉度可以增強數學推理能力。即使沒有明確數學訓練但接觸過相關文化背景的模型,有時在處理嵌入文化的數學問題時,其表現甚至超過了更大、數學能力更強的模型。這項研究強調了文化背景對LLMs數學推理能力的影響,並凸顯了需要更多樣化和代表性的訓練數據以提高現實世界應用中的魯棒性。基準數據集和重現結果的腳本可在以下網址獲取:https://github.com/akarim23131/Lost_in_Cultural_Translation。
English
Large Language Models (LLMs) have significantly advanced various fields, particularly coding, mathematical reasoning, and logical problem solving. However, a critical question remains: Do these mathematical reasoning abilities persist when LLMs are presented with culturally adapted math problems? Specifically, how do LLMs perform when faced with math problems embedded in cultural contexts that have no significant representation in main stream web-scale AI training data? To explore this, we generated six synthetic cultural datasets from GSM8K, a widely used benchmark for assessing LLMs' mathematical reasoning skills. While preserving the mathematical logic and numerical values of the original GSM8K test set, we modify cultural elements such as personal names, food items, place names, etc. These culturally adapted datasets provide a more reliable framework for evaluating LLMs' mathematical reasoning under shifting cultural contexts. Our findings reveal that LLMs struggle with math problems when cultural references change, even though the underlying mathematical structure remains constant. Smaller models exhibit greater performance drops compared to larger models. Interestingly, our results also suggest that cultural familiarity can enhance mathematical reasoning. Even models with no explicit mathematical training but exposure to relevant cultural contexts sometimes outperform larger, mathematically proficient models on culturally embedded math problems. This study highlights the impact of cultural context on the mathematical reasoning abilities of LLMs, underscoring the need for more diverse and representative training data to improve robustness in real-world applications. The benchmark data sets and script for reproducing the results are available at https://github.com/akarim23131/Lost_in_Cultural_Translation

Summary

AI-Generated Summary

PDF62March 25, 2025