Relying on a completely automated process, eRAGโs SQL generation capabilities ensure precise SQL queries and highly accurate responses from multiple data sources simultaneously. ย
eRAG manages the flow from natural language to SQL as a pipeline:ย
- Checks the Cache
- Generates the SQL statement
- Validates the syntax
- Executes the statement in natural languageย ย
Step #1 – Checks the Cache
Before engaging with the LLMs, eRAG first examines queries that are stored in the cache. If the examination shows a high similarity score, the cached questionโs corresponding SQL is matched to the newly asked question. As that stored statement has already been vetted, there is no need for additional validation and the statement can be executed.ย
Step #2 Generates the SQL Statement
Generating the SQL statement involves:ย
- Schema filtering: to obtain a clean and to-the-point SQL statement, eRAG analyzes the query to identify which tables to include with the context sent to the LLM, and filter out irrelevant tables.
- Schema linking: to enhance the accuracy of the SQL statement with additional context, eRAG maps the named entities in the question to relevant schema entities such as tables and fields.
- Example selection: this method, used here during inference, is similar to the few-shot learning method which is used to train machine learning models. Three types of examples to be included in the prompt include:
- Examples addressing inherent problems.
- Class-specific examples, as per classification of the specific question.
- Previously asked questions with a medium-high similarity score. Although these questions may not be similar enough to match the newly asked question, the corresponding SQL is close enough to provide an example as to how the right SQL statement should look.ย
Using all the above ingredients plus some best practices as to how to best address the LLM, eRAG generates the prompt and sends it to the LLM to generate an SQL statement candidate.ย
Step #3 Validates
With a SQL statement candidate in hand, eRAG moves down the pipeline to validation:
- Syntax check: eRAG validates the SQL statement for syntax. As SQL has many different dialects, this check is run in context of the target SQL platform.ย
- Sanity test: eRAG evaluates the results against a set of rules, then executes the SQL statement with a limit on the size of the result set, ensuring correctness.
- Retry (if required): if a validation fails, the LLM has one more shot at generating another SQL statement candidate. This prompt will also include the reason for the retry.ย
Step #4 Executes
Execution of the SQL statement comprises:ย
- Converting the results back to natural language, as part of the conversation experience
- Making the result dataset available to the userย
- Updating the cache