跨朝代時序推理與對齊基準測試
Benchmarking Temporal Reasoning and Alignment Across Chinese Dynasties
February 24, 2025
作者: Zhenglin Wang, Jialong Wu, Pengfei LI, Yong Jiang, Deyu Zhou
cs.AI
摘要
時間推理是人類認知的基礎,對於各種現實世界的應用至關重要。儘管大型語言模型的最新進展在時間推理方面展現了令人期待的能力,但現有的基準主要依賴於基於規則的構建,缺乏上下文深度,並且涉及的時間實體範圍有限。為了解決這些限制,我們引入了中國時間推理(CTM),這是一個旨在評估大型語言模型在中國朝代年表廣泛範圍內進行時間推理的基準。CTM強調跨實體關係、成對時間對齊以及情境化和文化基礎的推理,提供了全面的評估。大量的實驗結果揭示了CTM帶來的挑戰,並指出了潛在的改進方向。
English
Temporal reasoning is fundamental to human cognition and is crucial for
various real-world applications. While recent advances in Large Language Models
have demonstrated promising capabilities in temporal reasoning, existing
benchmarks primarily rely on rule-based construction, lack contextual depth,
and involve a limited range of temporal entities. To address these limitations,
we introduce Chinese Time Reasoning (CTM), a benchmark designed to evaluate
LLMs on temporal reasoning within the extensive scope of Chinese dynastic
chronology. CTM emphasizes cross-entity relationships, pairwise temporal
alignment, and contextualized and culturally-grounded reasoning, providing a
comprehensive evaluation. Extensive experimental results reveal the challenges
posed by CTM and highlight potential avenues for improvement.Summary
AI-Generated Summary