Still shipping your entire schema to a Text-to-SQL agent on every request? You're losing both accuracy and money 💸 Here's how a knowledge graph fixes both.
Title: How a Neo4j semantic layer makes your Text-to-SQL agent smarter and cheaper
URL:
💸 Overview
This post explains how to use a knowledge graph (Neo4j) as a semantic layer to make Text-to-SQL agents both smarter and cheaper. Instead of dumping the full schema every time, the agent retrieves only the subgraph relevant to the question — a GraphRAG approach.
❓ Challenges Solved
Most implementations store schema info in static YAML or Markdown and send the whole thing on every request. That creates three serious issues.
・High token cost: transmitting the entire schema repeatedly is expensive
・Contextual noise: irrelevant tables degrade accuracy and trigger hallucinations
・Poor maintainability: flat files go stale as business semantics evolve
💡 Methodology & Proposed Approach
The graph stores database structure (schemas, tables, columns, types), constraints, column dictionaries, a business glossary, and usage patterns. The agent retrieves only relevant context in three steps.
・Semantic similarity search: vector indices identify matching columns and terms
・Shortest-path search: find possible joins between identified tables
・Additional context: gather schema definitions, business terms, and sample values
Results are formatted as JSON with tables and join paths in milliseconds.
🌍 Use Cases / Experimental Results
The post reports improvements that matter directly for production.
・Token reduction: 20-30% on average, up to 10x on simple queries
・Accuracy (multi-table joins): ~98% (Neo4j) vs ~90% (YAML)
・Accuracy (complex CTEs with window functions): ~94% (Neo4j) vs ~85% (YAML)
・Token use scales with complexity (simple ~1,800 / multi-join ~5,000 / advanced ~7,300)
The graph captures dynamic usage patterns like join frequencies and behavioral relationships, enabling continuous improvement that static files simply can't model.
#
TextToSQL# #
KnowledgeGraph#