ChatPaper.aiChatPaper

文化无捷径:面向复杂文化理解的印尼多跳问答研究

No Shortcuts to Culture: Indonesian Multi-hop Question Answering for Complex Cultural Understanding

February 3, 2026
作者: Vynska Amalia Permadi, Xingwei Tan, Nafise Sadat Moosavi, Nikos Aletras
cs.AI

摘要

理解文化需要跨越情境、传统与隐性社会知识进行推理,这远非简单记忆孤立事实所能及。然而现有大多数聚焦文化的问答评测基准仍依赖于单跳问题,这种设置可能让模型通过浅层线索取巧,而非展现真正的文化推理能力。本研究推出ID-MoCQA——首个基于印尼传统文化构建的大规模多跳问答数据集,提供英语与印尼语双版本,专门用于评估大语言模型的文化理解能力。我们提出一种创新框架,能系统地将单跳文化问题转化为涵盖六种线索类型(如常识型、时间型、地理型)的多跳推理链。通过结合专家评审与LLM即评判的过滤机制,我们构建的多阶段验证流程确保了问答对的高质量。对前沿模型的评估结果表明,现有系统在文化推理方面存在显著不足,尤其在需要精细推理的任务上表现薄弱。ID-MoCQA为推动大语言模型文化认知能力的发展提供了兼具挑战性与必要性的评测基准。
English
Understanding culture requires reasoning across context, tradition, and implicit social knowledge, far beyond recalling isolated facts. Yet most culturally focused question answering (QA) benchmarks rely on single-hop questions, which may allow models to exploit shallow cues rather than demonstrate genuine cultural reasoning. In this work, we introduce ID-MoCQA, the first large-scale multi-hop QA dataset for assessing the cultural understanding of large language models (LLMs), grounded in Indonesian traditions and available in both English and Indonesian. We present a new framework that systematically transforms single-hop cultural questions into multi-hop reasoning chains spanning six clue types (e.g., commonsense, temporal, geographical). Our multi-stage validation pipeline, combining expert review and LLM-as-a-judge filtering, ensures high-quality question-answer pairs. Our evaluation across state-of-the-art models reveals substantial gaps in cultural reasoning, particularly in tasks requiring nuanced inference. ID-MoCQA provides a challenging and essential benchmark for advancing the cultural competency of LLMs.
PDF71February 5, 2026