回溯:检索查询的原因
Backtracing: Retrieving the Cause of the Query
March 6, 2024
作者: Rose E. Wang, Pawan Wirawarn, Omar Khattab, Noah Goodman, Dorottya Demszky
cs.AI
摘要
许多在线内容门户网站允许用户提出问题以增进他们的理解(例如,针对讲座)。虽然信息检索(IR)系统可以为这类用户查询提供答案,但它们并未直接帮助内容创作者——比如希望改进内容的讲师——识别导致用户提出这些问题的段落。我们引入了回溯任务,即系统检索最有可能导致用户查询的文本段落。我们为三个现实世界领域形式化了回溯的重要性,以改进内容传递和沟通:在讲座领域理解学生困惑的原因,新闻文章领域读者的好奇心,以及对话领域用户的情感。我们评估了流行的信息检索方法和语言建模方法的零-shot性能,包括双编码器、重新排序和基于可能性的方法以及ChatGPT。传统的IR系统检索语义相关信息(例如,针对查询“多次投影是否仍然导致相同点?”提供有关“投影矩阵”的详细信息),但它们经常错过因果相关的上下文(例如,讲师陈述“投影两次得到的答案与一次投影相同”)。我们的结果显示,在回溯方面仍有改进空间,并需要新的检索方法。我们希望我们的基准测试有助于改进未来用于回溯的检索系统,推动改进内容生成并识别影响用户查询的语言触发器的系统的产生。我们的代码和数据已开源:https://github.com/rosewang2008/backtracing。
English
Many online content portals allow users to ask questions to supplement their
understanding (e.g., of lectures). While information retrieval (IR) systems may
provide answers for such user queries, they do not directly assist content
creators -- such as lecturers who want to improve their content -- identify
segments that _caused_ a user to ask those questions. We introduce the task of
backtracing, in which systems retrieve the text segment that most likely caused
a user query. We formalize three real-world domains for which backtracing is
important in improving content delivery and communication: understanding the
cause of (a) student confusion in the Lecture domain, (b) reader curiosity in
the News Article domain, and (c) user emotion in the Conversation domain. We
evaluate the zero-shot performance of popular information retrieval methods and
language modeling methods, including bi-encoder, re-ranking and
likelihood-based methods and ChatGPT. While traditional IR systems retrieve
semantically relevant information (e.g., details on "projection matrices" for a
query "does projecting multiple times still lead to the same point?"), they
often miss the causally relevant context (e.g., the lecturer states "projecting
twice gets me the same answer as one projection"). Our results show that there
is room for improvement on backtracing and it requires new retrieval
approaches. We hope our benchmark serves to improve future retrieval systems
for backtracing, spawning systems that refine content generation and identify
linguistic triggers influencing user queries. Our code and data are
open-sourced: https://github.com/rosewang2008/backtracing.