投機的アドホッククエリ

要旨

大規模なデータセットを分析するには、迅速なクエリ実行が必要ですが、膨大なデータセットに対するSQLクエリの実行は遅くなりがちです。本論文では、ユーザーが入力を完了する前にクエリ実行を開始し、結果をほぼ瞬時に表示できるかどうかを探ります。我々は、大規模言語モデル（LLM）を活用して、データベーススキーマ、ユーザーの過去のクエリ、および未完成のクエリに基づいて、可能性の高いクエリを予測するシステム「SpeQL」を提案します。正確なクエリ予測は不可能であるため、SpeQLは部分的なクエリを2つの方法で推測します：1）クエリ構造を予測し、事前にクエリをコンパイルおよび計画し、2）元のデータベースよりもはるかに小さいが、ユーザーの最終的なクエリに必要なすべての情報を含むと予測される一時的なテーブルを事前に計算します。さらに、SpeQLは推測されたクエリやサブクエリの結果をリアルタイムで継続的に表示し、探索的分析を支援します。ユーティリティ/ユーザー調査では、SpeQLがタスク完了時間を改善し、参加者はその推測的な結果表示がデータのパターンをより迅速に発見するのに役立ったと報告しました。調査では、SpeQLはユーザーのクエリ遅延を最大289倍改善し、オーバーヘッドを1時間あたり4ドルと合理的な範囲に抑えました。

English

Analyzing large datasets requires responsive query execution, but executing SQL queries on massive datasets can be slow. This paper explores whether query execution can begin even before the user has finished typing, allowing results to appear almost instantly. We propose SpeQL, a system that leverages Large Language Models (LLMs) to predict likely queries based on the database schema, the user's past queries, and their incomplete query. Since exact query prediction is infeasible, SpeQL speculates on partial queries in two ways: 1) it predicts the query structure to compile and plan queries in advance, and 2) it precomputes smaller temporary tables that are much smaller than the original database, but are still predicted to contain all information necessary to answer the user's final query. Additionally, SpeQL continuously displays results for speculated queries and subqueries in real time, aiding exploratory analysis. A utility/user study showed that SpeQL improved task completion time, and participants reported that its speculative display of results helped them discover patterns in the data more quickly. In the study, SpeQL improves user's query latency by up to 289times and kept the overhead reasonable, at 4$ per hour.

投機的アドホッククエリ

Speculative Ad-hoc Querying

要旨

Support