非常规推理:关于不寻常情况的演绎推理
UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations
November 14, 2023
作者: Wenting Zhao, Justin T Chiu, Jena D. Hwang, Faeze Brahman, Jack Hessel, Sanjiban Choudhury, Yejin Choi, Xiang Lorraine Li, Alane Suhr
cs.AI
摘要
准确建模事件动态的语言技术必须进行常识推理。现有的评估常识推理的工作侧重于对常见、日常情况进行推断。为了研究模拟不寻常、意外和不太可能情况的能力,我们探索了非常识性推理任务。在给定一个具有意外结果的背景情境时,这项任务要求通过演绎推理生成一个自然语言解释,使意外结果在背景情境中更加可能发生。为此,我们策划并发布了一个名为UNcommonsense的新英语语料库。我们对人类解释者和表现最佳的大型语言模型的性能差异进行了表征,发现通过在具体性和多样性之间权衡,模型增强的人类撰写解释实现了最高质量。最后,我们尝试了几种在线模仿学习算法,以在这一任务上训练开放且可访问的语言模型。与基本的监督微调方法相比,这些方法在常识和非常识性推理上都能持续降低失误率,经由人类评估者评判。
English
Language technologies that accurately model the dynamics of events must
perform commonsense reasoning. Existing work evaluating commonsense reasoning
focuses on making inferences about common, everyday situations. To instead
investigate the ability to model unusual, unexpected, and unlikely situations,
we explore the task of uncommonsense abductive reasoning. Given a piece of
context with an unexpected outcome, this task requires reasoning abductively to
generate a natural language explanation that makes the unexpected outcome more
likely in the context. To this end, we curate and release a new English
language corpus called UNcommonsense. We characterize the differences between
the performance of human explainers and the best performing large language
models, finding that model-enhanced human-written explanations achieve the
highest quality by trading off between specificity and diversity. Finally, we
experiment with several online imitation learning algorithms to train open and
accessible language models on this task. When compared with the vanilla
supervised fine-tuning approach, these methods consistently reduce lose rates
on both common and uncommonsense abductive reasoning judged by human
evaluators.