GTR: An SQL Generator With Transition Representation in Cross-Domain Database Systems

IEEE Trans Neural Netw Learn Syst. 2023 Sep 6:PP. doi: 10.1109/TNNLS.2023.3309824. Online ahead of print.

Abstract

Recent studies have focused on using natural language (NL) to automatically retrieve useful data from database (DB) systems. As an important component of autonomous DB systems, the NL-to-SQL technique can assist DB administrators in writing high-quality SQL statements and make persons with no SQL background knowledge learn complex SQL languages. However, existing studies cannot deal with the issue that the expression of NL inevitably mismatches the implementation details of SQLs, and the large number of out-of-domain (OOD) words makes it difficult to predict table columns. In particular, it is difficult to accurately convert NL into SQL in an end-to-end fashion. Intuitively, it facilitates the model to understand the relations if a "bridge" transition representation (TR) is employed to make it compatible with both NL and SQL in the phase of conversion. In this article, we propose an automatic SQL generator with TR called GTR in cross-domain DB systems. Specifically, GTR contains three SQL generation steps: 1) GTR learns the relation between questions and DB schemas; 2) GTR uses a grammar-based model to synthesize a TR; and 3) GTR predicts SQL from TR based on the rules. We conduct extensive experiments on two commonly used datasets, that is, WikiSQL and Spider. On the testing set of the Spider and WikiSQL datasets, the results show that GTR achieves 58.32% and 71.29% exact matching accuracy which outperforms the state-of-the-art methods, respectively.