檢索增強機器學習：綜合與機遇

摘要

在語言建模領域中，搭配檢索組件的模型已經成為解決自然語言處理（NLP）領域中面臨的幾個挑戰的一個有前途的解決方案，包括知識基礎、可解釋性和可擴展性。儘管主要關注於NLP，我們認為檢索增強範式可以擴展到更廣泛的機器學習（ML）領域，如計算機視覺、時間序列預測和計算生物學。因此，本研究通過綜合ML各個領域的文獻，引入了一個正式的框架，稱為檢索增強機器學習（REML），其中包含了一致的符號，這在當前文獻中尚未出現。此外，我們發現，雖然許多研究利用檢索組件來增強其模型，但缺乏與基礎資訊檢索（IR）研究的整合。我們通過研究構成REML框架的每個組件，來彌合這一重要的IR研究和當代REML研究之間的差距。最終，本研究的目標是為各個學科的研究人員提供一個全面、正式結構化的檢索增強模型框架，從而促進跨學科的未來研究。

English

In the field of language modeling, models augmented with retrieval components have emerged as a promising solution to address several challenges faced in the natural language processing (NLP) field, including knowledge grounding, interpretability, and scalability. Despite the primary focus on NLP, we posit that the paradigm of retrieval-enhancement can be extended to a broader spectrum of machine learning (ML) such as computer vision, time series prediction, and computational biology. Therefore, this work introduces a formal framework of this paradigm, Retrieval-Enhanced Machine Learning (REML), by synthesizing the literature in various domains in ML with consistent notations which is missing from the current literature. Also, we found that while a number of studies employ retrieval components to augment their models, there is a lack of integration with foundational Information Retrieval (IR) research. We bridge this gap between the seminal IR research and contemporary REML studies by investigating each component that comprises the REML framework. Ultimately, the goal of this work is to equip researchers across various disciplines with a comprehensive, formally structured framework of retrieval-enhanced models, thereby fostering interdisciplinary future research.

檢索增強機器學習：綜合與機遇

Retrieval-Enhanced Machine Learning: Synthesis and Opportunities

摘要

Support