추측적 임시 쿼리

초록

대규모 데이터셋을 분석하려면 빠른 쿼리 실행이 필요하지만, 방대한 데이터셋에 SQL 쿼리를 실행하는 것은 느릴 수 있습니다. 본 논문은 사용자가 쿼리 입력을 완료하기 전에 쿼리 실행을 시작하여 결과를 거의 즉시 표시할 수 있는지 탐구합니다. 우리는 SpeQL이라는 시스템을 제안하는데, 이 시스템은 대형 언어 모델(LLM)을 활용하여 데이터베이스 스키마, 사용자의 과거 쿼리, 그리고 불완전한 쿼리를 기반으로 가능성이 높은 쿼리를 예측합니다. 정확한 쿼리 예측은 불가능하기 때문에, SpeQL은 부분 쿼리를 두 가지 방식으로 추측합니다: 1) 쿼리 구조를 예측하여 미리 쿼리를 컴파일하고 계획하며, 2) 원본 데이터베이스보다 훨씬 작지만 사용자의 최종 쿼리에 필요한 모든 정보를 포함할 것으로 예측되는 임시 테이블을 미리 계산합니다. 또한, SpeQL은 추측된 쿼리와 하위 쿼리에 대한 결과를 실시간으로 지속적으로 표시하여 탐색적 분석을 돕습니다. 유틸리티/사용자 연구에서 SpeQL은 작업 완료 시간을 단축했으며, 참가자들은 결과의 추측적 표시가 데이터 내 패턴을 더 빠르게 발견하는 데 도움이 되었다고 보고했습니다. 연구에서 SpeQL은 사용자의 쿼리 지연 시간을 최대 289배까지 개선했으며, 시간당 4달러의 합리적인 오버헤드를 유지했습니다.

English

Analyzing large datasets requires responsive query execution, but executing SQL queries on massive datasets can be slow. This paper explores whether query execution can begin even before the user has finished typing, allowing results to appear almost instantly. We propose SpeQL, a system that leverages Large Language Models (LLMs) to predict likely queries based on the database schema, the user's past queries, and their incomplete query. Since exact query prediction is infeasible, SpeQL speculates on partial queries in two ways: 1) it predicts the query structure to compile and plan queries in advance, and 2) it precomputes smaller temporary tables that are much smaller than the original database, but are still predicted to contain all information necessary to answer the user's final query. Additionally, SpeQL continuously displays results for speculated queries and subqueries in real time, aiding exploratory analysis. A utility/user study showed that SpeQL improved task completion time, and participants reported that its speculative display of results helped them discover patterns in the data more quickly. In the study, SpeQL improves user's query latency by up to 289times and kept the overhead reasonable, at 4$ per hour.

추측적 임시 쿼리

Speculative Ad-hoc Querying

초록

Support