PARROT：一個用於評估大型語言模型在跨系統SQL翻譯中的基準

摘要

大型語言模型（LLMs）在文本到SQL任務中展現出日益增強的效能。然而，另一個密切相關的問題——跨系統SQL轉換（即SQL-to-SQL），即將為一個數據庫系統（如MySQL）編寫的查詢轉換為適用於另一系統（如ClickHouse）的等效查詢，具有極大的實際重要性，卻仍未被充分探索。現有的SQL基準測試並不適合用於SQL-to-SQL評估，原因在於它們（1）僅限於少數數據庫系統（通常僅SQLite），且（2）無法涵蓋許多系統特有的SQL方言（例如，自定義函數、數據類型和語法規則）。因此，本文介紹了PARROT，一個專為跨系統SQL轉換設計的實用且真實的基準測試。PARROT包含來自38個開源基準測試和真實商業服務的598對轉換樣本，特別設計來挑戰系統特定的SQL理解能力（例如，LLMs在此類任務上的平均準確率低於38.53%）。我們還提供了多個基準測試變體，包括包含28,003個轉換的PARROT-Diverse（用於廣泛的語法測試）和包含5,306個代表性樣本的PARROT-Simple（用於集中的壓力測試），覆蓋了22個生產級數據庫系統。為推動未來研究，我們在以下網址發布了公開排行榜和源代碼：https://code4db.github.io/parrot-bench/。

English

Large language models (LLMS) have shown increasing effectiveness in Text-to-SQL tasks. However, another closely related problem, Cross-System SQL Translation (a.k.a., SQL-to-SQL), which adapts a query written for one database system (e.g., MySQL) into its equivalent one for another system (e.g., ClickHouse), is of great practical importance but remains underexplored. Existing SQL benchmarks are not well-suited for SQL-to-SQL evaluation, which (1) focus on a limited set of database systems (often just SQLite) and (2) cannot capture many system-specific SQL dialects (e.g., customized functions, data types, and syntax rules). Thus, in this paper, we introduce PARROT, a Practical And Realistic BenchmaRk for CrOss-System SQL Translation. PARROT comprises 598 translation pairs from 38 open-source benchmarks and real-world business services, specifically prepared to challenge system-specific SQL understanding (e.g., LLMS achieve lower than 38.53% accuracy on average). We also provide multiple benchmark variants, including PARROT-Diverse with 28,003 translations (for extensive syntax testing) and PARROT-Simple with 5,306 representative samples (for focused stress testing), covering 22 production-grade database systems. To promote future research, we release a public leaderboard and source code at: https://code4db.github.io/parrot-bench/.

PARROT：一個用於評估大型語言模型在跨系統SQL翻譯中的基準

PARROT: A Benchmark for Evaluating LLMs in Cross-System SQL Translation

摘要

Support