检索增强机器学习:综合与机遇
Retrieval-Enhanced Machine Learning: Synthesis and Opportunities
July 17, 2024
作者: To Eun Kim, Alireza Salemi, Andrew Drozdov, Fernando Diaz, Hamed Zamani
cs.AI
摘要
在语言建模领域,增加检索组件的模型已经成为解决自然语言处理(NLP)领域面临的多个挑战的一种有前途的解决方案,其中包括知识基础、可解释性和可扩展性。尽管主要关注于NLP,我们认为检索增强范式可以扩展到更广泛的机器学习(ML)领域,如计算机视觉、时间序列预测和计算生物学。因此,本文通过综合各个ML领域的文献,引入了一个正式的框架,即检索增强机器学习(REML)范式,并使用一致的符号表示,这在当前文献中尚未涉及。此外,我们发现,虽然许多研究采用检索组件来增强其模型,但缺乏与基础信息检索(IR)研究的整合。我们通过研究构成REML框架的每个组件,弥合了开创性IR研究与当代REML研究之间的差距。最终,本研究的目标是为各个学科的研究人员提供一个全面、形式化结构的检索增强模型框架,从而促进跨学科未来研究的发展。
English
In the field of language modeling, models augmented with retrieval components
have emerged as a promising solution to address several challenges faced in the
natural language processing (NLP) field, including knowledge grounding,
interpretability, and scalability. Despite the primary focus on NLP, we posit
that the paradigm of retrieval-enhancement can be extended to a broader
spectrum of machine learning (ML) such as computer vision, time series
prediction, and computational biology. Therefore, this work introduces a formal
framework of this paradigm, Retrieval-Enhanced Machine Learning (REML), by
synthesizing the literature in various domains in ML with consistent notations
which is missing from the current literature. Also, we found that while a
number of studies employ retrieval components to augment their models, there is
a lack of integration with foundational Information Retrieval (IR) research. We
bridge this gap between the seminal IR research and contemporary REML studies
by investigating each component that comprises the REML framework. Ultimately,
the goal of this work is to equip researchers across various disciplines with a
comprehensive, formally structured framework of retrieval-enhanced models,
thereby fostering interdisciplinary future research.Summary
AI-Generated Summary