Skip to content
GigaSpaces Logo GigaSpaces Logo
  • Products
    • Our Products
      • eRAG
        • GenAI Catalyst
        • Instant Data
        • Respond Proactively
        • Act Autonomously
      • Smart DIH
      • XAP
    • Solutions for
      • Pharma
      • Procurement
    • vid-icon

      Conventional RAG Falls Short with Enterprise Databases

      Watch the Webinaricon
  • Solutions
    • Business Solutions
      • Digital Innovation Over Legacy Systems
      • Integration Data Hub
      • API Scaling
      • Hybrid / Multi-cloud Integration
      • Customer 360
      • Industry Solutions
      • Retail
      • Financial Services
      • Insurance Companies
    • vid-icon

      Massimo Pezzini, Gartner Analyst Emeritus

      5 Top Use Cases For Driving Business With Data Hub Architecture

      Watch the Webinaricon
  • How it Works
    • eRAG Technology Overview
      • AI-Ready, IT-Friendly
      • Semantic Reasoning
      • Questions to SQL Queries
      • Asked & Answered in Natural Language
      • Multiple Data Sources
      • Proactive AI Governance
    • vid-icon

      Ensure GenAI compliance and governance

      Read the Whitepapericon
  • Success Stories
    • By Use Case
      • Procurement
      • Operations
      • Budget Management
      • Sales Operations
      • Service Providers
      • Utilities Management
      • Restaurant Management
    • By Industry
      • Logistics
      • Pharma
      • Education
      • Retail
      • Shipping
      • Energy
      • Hospitality
    • vid-icon

      Monkey See, AI Do - All about CUA

      Watch Webinaricon
  • Resources
    • Content Hub
      • Case Studies
      • Webinars
      • Q&As
      • Videos
      • Whitepapers & Brochures
      • Events
      • Glossary
      • Blog
      • FAQs
      • Technical Documentation
    • vid-icon

      Taking the AI leap from RAG to TAG

      Read the Blogicon
  • Company
    • Our Company
      • About
      • Customers
      • Management
      • Board Members
      • Investors
      • News
      • Press Releases
      • Careers
    • col2
      • Partners
      • OEM Partners
      • System Integrators
      • Technology Partners
      • Value Added Resellers
      • Support & Services
      • Services
      • Support
    • vid-icon

      GigaSpaces, IBM & AWS make AI safer

      Read Howicon
  • Book a Demo
  • Products
    • Our Products
      • eRAG
        • GenAI Catalyst
        • Instant Data
        • Respond Proactively
        • Act Autonomously
      • Smart DIH
      • XAP
    • Solutions for
      • Pharma
      • Procurement
    • vid-icon

      Conventional RAG Falls Short with Enterprise Databases

      Watch the Webinaricon
  • Solutions
    • Business Solutions
      • Digital Innovation Over Legacy Systems
      • Integration Data Hub
      • API Scaling
      • Hybrid / Multi-cloud Integration
      • Customer 360
      • Industry Solutions
      • Retail
      • Financial Services
      • Insurance Companies
    • vid-icon

      Massimo Pezzini, Gartner Analyst Emeritus

      5 Top Use Cases For Driving Business With Data Hub Architecture

      Watch the Webinaricon
  • How it Works
    • eRAG Technology Overview
      • AI-Ready, IT-Friendly
      • Semantic Reasoning
      • Questions to SQL Queries
      • Asked & Answered in Natural Language
      • Multiple Data Sources
      • Proactive AI Governance
    • vid-icon

      Ensure GenAI compliance and governance

      Read the Whitepapericon
  • Success Stories
    • By Use Case
      • Procurement
      • Operations
      • Budget Management
      • Sales Operations
      • Service Providers
      • Utilities Management
      • Restaurant Management
    • By Industry
      • Logistics
      • Pharma
      • Education
      • Retail
      • Shipping
      • Energy
      • Hospitality
    • vid-icon

      Monkey See, AI Do - All about CUA

      Watch Webinaricon
  • Resources
    • Content Hub
      • Case Studies
      • Webinars
      • Q&As
      • Videos
      • Whitepapers & Brochures
      • Events
      • Glossary
      • Blog
      • FAQs
      • Technical Documentation
    • vid-icon

      Taking the AI leap from RAG to TAG

      Read the Blogicon
  • Company
    • Our Company
      • About
      • Customers
      • Management
      • Board Members
      • Investors
      • News
      • Press Releases
      • Careers
    • col2
      • Partners
      • OEM Partners
      • System Integrators
      • Technology Partners
      • Value Added Resellers
      • Support & Services
      • Services
      • Support
    • vid-icon

      GigaSpaces, IBM & AWS make AI safer

      Read Howicon
  • Book a Demo
  • Products
    • Our Products
      • eRAG
        • GenAI Catalyst
        • Instant Data
        • Respond Proactively
        • Act Autonomously
      • Smart DIH
      • XAP
    • Solutions for
      • Pharma
      • Procurement
  • Solutions
    • Digital Innovation Over Legacy Systems
    • Integration Data Hub
    • API Scaling
    • Hybrid/Multi-cloud Integration
    • Customer 360
    • Retail
    • Financial Services
    • Insurance Companies
  • How it Works
    • eRAG Technology Overview
      • AI-Ready, IT-Friendly
      • Semantic Reasoning
      • Questions to SQL Queries
      • Asked & Answered in Natural Language
      • Multiple Data Sources
      • Governance
  • Success Stories
    • By Use Case
      • Procurement
      • Operations
      • Budget Management
      • Sales Operations
      • Service Providers
      • Utilities Management
      • Restaurant Management
    • By Industry
      • Logistics
      • Pharma
      • Education
      • Retail
      • Shipping
      • Energy
      • Hospitality
  • Resources
    • Webinars
    • Videos
    • Q&As
    • Whitepapers & Brochures
    • Customer Case Studies
    • Events
    • Glossary
    • FAQs
    • Blog
    • Technical Documentation
  • Company
    • About
    • Customers
    • Management
    • Board Members
    • Investors
    • News
    • Press Releases
    • Careers
    • Partners
      • OEM Partners
      • System Integrators
      • Technology Partners
      • Value Added Resellers
    • Support & Services
      • Services
      • Support
  • Pricing
  • Book a Demo

Why LLMs Need Help for Accurate SQL Generation at Scale

224

Subscribe for Updates
Close
Back

BLOG

Why LLMs Need Help for Accurate SQL Generation at Scale

Meni Meller
February 27, 2025 /
8min. read

Large Language Models (LLMs) are powerful tools for natural language processing, but they have inherent limitations when dealing with structured relational data. Many assume that providing an LLM with a database schema and a natural language question is enough for it to generate correct SQL queries. However, this approach overlooks the fundamental nature of LLMs: they are designed to understand and generate natural language, not to inherently grasp the intricacies of database schemas, table structures, and field relationships.

Contents

Toggle
  • The Challenge: One-Word Labels with Deep Business Meaning
    • Example: Misinterpreting Schema Naming Conventions
  • The Importance of a Semantic Reasoning Layer
    • Key Components of Semantic Reasoning
    • Example: Semantic Reasoning in Action
  • Addressing the Inconsistency of LLM Responses
  • The Role of Dynamic Semantic Updates
    • Example: Handling Code Lists Dynamically
  • Scalability and Performance Considerations
  • Conclusion

This blog explores why an LLM alone is insufficient for accurate SQL generation and why a proper solution requires a semantic reasoning layer that offers rich contextual understanding of database objects. By providing natural language explanations of tables, fields, keys, and values, filtering relevant schema elements dynamically, and continuously updating semantic information, we can create a system that bridges the gap between natural language and structured data. This ensures higher accuracy, reliability, and efficiency—critical requirements for any enterprise-grade data querying solution.

We will use examples from publicly available ERP schemas and sample databases to illustrate these points, emphasizing the need for structured semantic enrichment in AI-driven data querying.

The Challenge: One-Word Labels with Deep Business Meaning

Relational databases are designed to store structured, highly accurate data, with each table, view, field, and key carefully named according to business logic. However, these names are typically single words or abbreviations that carry tremendous business meaning—meaning that an LLM cannot infer from the schema alone.

For example, in an ERP system like SAP or Oracle ERP, a table named ORDERS might store sales transactions, while CUSTOMERS holds client information. The field STATUS in ORDERS might store values like P for pending, C for completed, and X for canceled. Without a semantic reasoning layer explaining what these values represent, an LLM may misinterpret the meaning and generate incorrect SQL queries.

Example: Misinterpreting Schema Naming Conventions

Consider an SQL query request:

User Query: “Retrieve all completed orders placed by VIP customers last month.”

A simple schema might look like this:

TABLE ORDERS (
    ORDER_ID INT,
    CUSTOMER_ID INT,
    STATUS CHAR(1),
    ORDER_DATE DATE
);
TABLE CUSTOMERS (
    CUSTOMER_ID INT,
    CUSTOMER_TYPE CHAR(1)
);

A naive LLM-based approach might generate:

SELECT * FROM ORDERS WHERE STATUS = ‘COMPLETED’ AND ORDER_DATE >= ‘2024-01-01’; This query is incorrect for two reasons:

  1. The STATUS field does not use COMPLETED as a value; it uses C.
  2. The CUSTOMER_TYPE field in CUSTOMERS determines VIP status, but the LLM did not join ORDERS with CUSTOMERS.

Without semantic reasoning, the LLM fails to interpret that STATUS = ‘C’ represents completion and that CUSTOMER_TYPE = ‘V’ indicates a VIP customer. A well-designed system should provide the LLM with a mapping of field values and business rules to ensure accuracy.

 

The Importance of a Semantic Reasoning Layer

To bridge the gap between relational database structures and natural language understanding, a semantic reasoning layer is necessary. This layer provides contextual information about tables, fields, and relationships, allowing the LLM to generate more accurate queries.

Key Components of Semantic Reasoning

  1. Table and Field Descriptions: Each table, field, and key should have a natural language explanation. For example:
    • ORDERS.STATUS → “Order status: ‘P’ = Pending, ‘C’ = Completed, ‘X’ = Canceled.”
    • CUSTOMERS.CUSTOMER_TYPE → “Customer type: ‘V’ = VIP, ‘R’ = Regular.”
  1. Business Rules and Constraints: The reasoning layer should define logical relationships, such as “Only CUSTOMERS with CUSTOMER_TYPE = ‘V’ should be included when searching for VIP customers.”
  2. Example Queries and Mappings: Providing examples of valid SQL queries mapped to similar natural language questions can guide the LLM in generating accurate queries.
  3. Filtering Mechanisms: If a database has 1,000 tables, sending the entire schema to an LLM for every question is inefficient. Instead, the system should intelligently filter and send only the relevant tables, keys, and constraints based on the question.

Example: Semantic Reasoning in Action

If we enhance our system with semantic reasoning, it can generate the correct SQL for the previous example:

SELECT * FROM ORDERS
JOIN CUSTOMERS ON ORDERS.CUSTOMER_ID = CUSTOMERS.CUSTOMER_ID
WHERE ORDERS.STATUS = 'C'
AND CUSTOMERS.CUSTOMER_TYPE = 'V'
AND ORDER_DATE >= '2024-01-01';

This improved accuracy comes from having explicit mappings and constraints in the semantic reasoning layer.

Addressing the Inconsistency of LLM Responses

Unlike relational databases, which are deterministic and return exact results, LLMs generate responses probabilistically. This means that the same input might not always yield the same SQL query. A robust AI-driven SQL generation system must mitigate this inconsistency by implementing:

  • Query Validation Mechanisms: Pre & Post-processing SQL statements and outputs to check for errors, expected resultset size and classifying the query’s complexity before execution.
  • Audit and Security Controls: Ensuring queries adhere to security policies and do not expose sensitive data, providing organizations with control, visibility and management capabilities to align with known security best practices as well as descriptive AI to understand why a specific prompt generated that specific query.
  • Feedback Loops: Allowing users to refine queries and correct mistakes, improving future responses.

The Role of Dynamic Semantic Updates

A key advantage of a semantic reasoning layer is its ability to evolve. Businesses frequently update database schemas, add new fields, and change value mappings. The reasoning layer should support dynamic updates to maintain accuracy.

Example: Handling Code Lists Dynamically

An ERP system might store order priority levels as follows:

PRIORITY_LEVEL
----------------
H = High
M = Medium
L = Low

If a company introduces a new priority level (E = Emergency), the semantic layer should be updated so the LLM understands this new classification. Without this update, queries involving priority levels might become inaccurate.

Scalability and Performance Considerations

For an AI-powered SQL system to be scalable and efficient, it must:

  • Optimize Schema and Example Filtering: Instead of passing 1,000 tables to the LLM, the system should identify and send only the necessary four tables for a specific query.
  • Use Caching and Preprocessing: Frequently asked questions should have precomputed query templates to reduce LLM processing time.

Implement Efficient Parsing Pipelines: The system should intelligently parse user queries, identify required tables, and construct optimal SQL queries.

Conclusion

LLMs are not inherently designed to interpret database schemas accurately. Relying solely on sending an LLM a schema and expecting perfect SQL queries is a limited and somewhat flawed approach. Instead, a robust AI-driven SQL generation system must incorporate at least a semantic reasoning layer, such as that provided by GigaSpaces eRAG that enriches database metadata with business context. To provide a holistic solution addressing security, compliance, scale, performance and accuracy – an end-to-end design, such as eRAG, must be implemented.

Tags:

GenAI RAG
Meni Meller

Sales engineering director, EMEA.

Meni Meller, Director of Sales Engineering EMEA, joined GigaSpaces in 2022. Meni has extensive experience in solution engineering and SE leadership, in the areas of data, security and enterprise architectures. Meni worked for Oracle, Sun Microsystems and others.

All Posts (3)

Share this Article

Subscribe to Our Blog



PRODUCTS & SOLUTIONS

  • Products
    • eRAG
    • Smart DIH
    • XAP
  • Our Technology
    • Semantic Reasoning
    • Natural language to SQL
    • RAG for Structured Data
    • In-Memory Data Grid
    • Data Integration
    • Data Operations by Multiple Access Methods
    • Unified Data Model
    • Event-Driven Architecture

RESOURCES

  • Resource Hub
  • Webinars
  • Q&As
  • Blogs
  • FAQs
  • Videos
  • Whitepapers & Brochures
  • Customer Case Studies
  • Events
  • Use Cases
  • Analyst Reports
  • Technical Documentation

COMPANY

  • About
  • Customers
  • Management
  • Board Members
  • Investors
  • News
  • Careers
  • Contact Us
  • Book A Demo
  • Partners
  • OEM Partners
  • System Integrators
  • Value Added Resellers
  • Technology Partners
  • Support & Services
  • Services
  • Support
Copyright © GigaSpaces 2025 All rights reserved | Privacy Policy | Terms of Use
LinkedInXFacebookYouTube
Manage your privacy

To provide the best experiences, we and our partners use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us and our partners to process personal data such as browsing behavior or unique IDs on this site and show (non-) personalized ads. Not consenting or withdrawing consent, may adversely affect certain features and functions.

Click below to consent to the above or make granular choices. Your choices will be applied to this site only. You can change your settings at any time, including withdrawing your consent, by using the toggles on the Cookie Policy, or by clicking on the manage consent button at the bottom of the screen.

Functional Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
Statistics

Marketing

Features
Always active

Always active
  • Manage options
  • Manage services
  • Manage {vendor_count} vendors
  • Read more about these purposes
Manage options
  • {title}
  • {title}
  • {title}
Manage your privacy
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
Statistics

Marketing

Features
Always active

Always active
  • Manage options
  • Manage services
  • Manage {vendor_count} vendors
  • Read more about these purposes
Manage options
  • {title}
  • {title}
  • {title}
Skip to content
Open toolbar Accessibility Tools

Accessibility Tools

  • Increase TextIncrease Text
  • Decrease TextDecrease Text
  • GrayscaleGrayscale
  • High ContrastHigh Contrast
  • Negative ContrastNegative Contrast
  • Light BackgroundLight Background
  • Links UnderlineLinks Underline
  • Readable FontReadable Font
  • Reset Reset
  • SitemapSitemap

Hey
tell us what
you need

You can unsubscribe from these communications at any time. For more information on how to unsubscribe, our privacy practices, and how we are committed to protecting and respecting your privacy, please review our Privacy Policy.

Hey , tell us what you need

You can unsubscribe from these communications at any time. For more information on how to unsubscribe, our privacy practices, and how we are committed to protecting and respecting your privacy, please review our Privacy Policy.

Oops! Something went wrong, please check email address (work email only).
Thank you!
We will get back to You shortly.