ChatPaper.aiChatPaper

LLM-R2:一個增強式基於規則的大型語言模型重寫系統,用於提高查詢效率。

LLM-R2: A Large Language Model Enhanced Rule-based Rewrite System for Boosting Query Efficiency

April 19, 2024
作者: Zhaodonghui Li, Haitao Yuan, Huiming Wang, Gao Cong, Lidong Bing
cs.AI

摘要

查詢重寫旨在通過改變 SQL 查詢的結構而不改變查詢結果來生成更有效的查詢,一直是一個重要的研究問題。為了在重寫期間保持重寫後的查詢與原始查詢的等效性,傳統的查詢重寫方法總是按照特定的重寫規則來重寫查詢。然而,仍然存在一些問題。首先,現有的尋找最佳重寫規則選擇或順序的方法仍然有限,而且這個過程總是耗費大量資源。涉及發現新的重寫規則的方法通常需要複雜的結構邏輯證明或廣泛的用戶交互。其次,當前的查詢重寫方法通常高度依賴 DBMS 成本估算器,而這些估算器通常不準確。在本文中,我們通過提出一種名為 LLM-R2 的新型查詢重寫方法來解決這些問題,該方法採用大型語言模型(LLM)為數據庫重寫系統提出可能的重寫規則。為了進一步提高LLM在推薦重寫規則方面的推理能力,我們通過課程訓練對比模型來學習查詢表示並為LLM選擇有效的查詢示範。實驗結果表明,我們的方法可以顯著提高查詢執行效率並優於基準方法。此外,我們的方法在不同數據集上具有很高的韌性。
English
Query rewrite, which aims to generate more efficient queries by altering a SQL query's structure without changing the query result, has been an important research problem. In order to maintain equivalence between the rewritten query and the original one during rewriting, traditional query rewrite methods always rewrite the queries following certain rewrite rules. However, some problems still remain. Firstly, existing methods of finding the optimal choice or sequence of rewrite rules are still limited and the process always costs a lot of resources. Methods involving discovering new rewrite rules typically require complicated proofs of structural logic or extensive user interactions. Secondly, current query rewrite methods usually rely highly on DBMS cost estimators which are often not accurate. In this paper, we address these problems by proposing a novel method of query rewrite named LLM-R2, adopting a large language model (LLM) to propose possible rewrite rules for a database rewrite system. To further improve the inference ability of LLM in recommending rewrite rules, we train a contrastive model by curriculum to learn query representations and select effective query demonstrations for the LLM. Experimental results have shown that our method can significantly improve the query execution efficiency and outperform the baseline methods. In addition, our method enjoys high robustness across different datasets.

Summary

AI-Generated Summary

PDF121December 15, 2024