文化无捷径:面向复杂文化理解的印尼多跳问答研究
No Shortcuts to Culture: Indonesian Multi-hop Question Answering for Complex Cultural Understanding
February 3, 2026
作者: Vynska Amalia Permadi, Xingwei Tan, Nafise Sadat Moosavi, Nikos Aletras
cs.AI
摘要
理解文化需要跨越语境、传统与隐性社会知识进行推理,这远非简单回忆孤立事实所能及。然而现有文化类问答基准大多依赖单跳问题,可能导致模型利用浅层线索而非展现真正的文化推理能力。本研究推出ID-MoCQA——首个基于印尼传统文化的大规模多跳问答数据集,提供英语与印尼语双版本,用于评估大语言模型的文化理解能力。我们提出一种创新框架,能系统地将单跳文化问题转化为涵盖六种线索类型(如常识、时间、地理)的多跳推理链。通过结合专家评审与LLM评判过滤的多阶段验证流程,我们确保了问答对的高质量。对前沿模型的评估结果显示,其在文化推理方面存在显著差距,尤其在需要精细推理的任务上。ID-MoCQA为推动大语言模型文化能力发展提供了具有挑战性的重要基准。
English
Understanding culture requires reasoning across context, tradition, and implicit social knowledge, far beyond recalling isolated facts. Yet most culturally focused question answering (QA) benchmarks rely on single-hop questions, which may allow models to exploit shallow cues rather than demonstrate genuine cultural reasoning. In this work, we introduce ID-MoCQA, the first large-scale multi-hop QA dataset for assessing the cultural understanding of large language models (LLMs), grounded in Indonesian traditions and available in both English and Indonesian. We present a new framework that systematically transforms single-hop cultural questions into multi-hop reasoning chains spanning six clue types (e.g., commonsense, temporal, geographical). Our multi-stage validation pipeline, combining expert review and LLM-as-a-judge filtering, ensures high-quality question-answer pairs. Our evaluation across state-of-the-art models reveals substantial gaps in cultural reasoning, particularly in tasks requiring nuanced inference. ID-MoCQA provides a challenging and essential benchmark for advancing the cultural competency of LLMs.