VOOZH about

URL: https://thenewstack.io/sql-schema-generation-with-large-language-models/

⇱ SQL Schema Generation With Large Language Models - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-05-04 05:00:20
SQL Schema Generation With Large Language Models
tutorial,
Data / Large Language Models

SQL Schema Generation With Large Language Models

We discover that mapping one domain (publishing) into another (the domain-specific language of SQL) works heavily to an LLM's strengths.
May 4th, 2024 5:00am by David Eastman
👁 Featued image for: SQL Schema Generation With Large Language Models
Image via Unsplash+. 

I’ve looked at both regex and JSON persistence generation with LLMs, but it is Structured Query Language (SQL) that many believe is handled well by AI. To help celebrate SQL’s 50th birthday, let’s talk tables, introducing technical terminology as we need it. However, I don’t want to simply test queries against existing tables. The world of relational databases starts with the schema. A schema describes a group of tables that interact to allow SQL queries to answer questions about real world system models. We use various constraints to control how the tables relate to each other. In this example, I’ll develop a schema about books, authors and publishers. Then we’ll see if an LLM can reproduce the work. We start with the relationships between our things. A book is written by an author and published by a publisher. Indeed, the publication of a book defines the relationship between author and publisher. So in concrete terms, we want to produce a result like this:

Book Author Publisher Release Date
The Wasp Factory Iain Banks Abacus 16 February 1984
Consider Phlebas Iain M. Banks Orbit 14 April 1988
This reads nicely (we will return to it later), but the table itself would be a poor way to maintain more information. If the publisher’s name is just a string, it might need to be entered many times — which is both inefficient and error prone. The same for author. Those of a literary bent will know that the author (Iain Banks) is the same for both books, but he used a slightly altered pseudonym when writing science fiction. What if the book was released again later by a different publisher? To be certain to distinguish between the two publication events, we would need both the book’s titles and the release date — so our primary key or unique identification must include both. We want the system to reject any attempts to enter two books with the same title and publication date. Instead of using one big table, let’s use three tables and references to them where needed. One for authors, one for publishers, and one for books. We write the details of the author in the Authors table, and then reference them in the Books table using a foreign key. So here are the schema tables written in Data Definition Language (DDL). I’m using the MySQL variant — annoyingly, all the vendors still maintain slightly different dialects. First, the authors table. We add an automatic ID column index as a primary key. We don’t actually solve the pseudonym problem (I’ll leave that for readers):
CREATE TABLE Authors ( 
 ID int NOT NULL AUTO_INCREMENT, 
 Name varchar(255) not null, 
 Birthday date not null, 
 PRIMARY KEY (ID) 
);
The publishers table follows the same pattern. The “NOT NULL” is another constraint to prevent data being added without content.
CREATE TABLE Publishers ( 
 ID int NOT NULL AUTO_INCREMENT, 
 Name varchar(255) not null, 
 Address varchar(255) not null, 
 PRIMARY KEY (ID) 
);
The Books table will refer to the foreign keys, which leaves it logical but slightly unreadable. Note that we respect that the title of the book together with its publication date makes up the primary key.
CREATE TABLE Books ( 
 Name varchar(255) NOT NULL, 
 AuthorID int, PublisherID int, 
 PublishedDate date NOT NULL, 
 PRIMARY KEY (Name, PublishedDate), 
 FOREIGN KEY (AuthorID) REFERENCES Authors(ID), 
 FOREIGN KEY (PublisherID) REFERENCES Publishers(ID) 
);
To see a neat table like the one at the top, we need a view. This is just a way of sewing together tables so that we can pick out the information we need to display, while leaving the schema untouched. Now that we have the schema written down, we can construct our view:
CREATE VIEW ViewableBooks AS 
SELECT Books.Name 'Book', Authors.Name 'Author', Publishers.Name 'Publisher', Books.PublishedDate 'Date' 
FROM Books, Publishers, Authors 
WHERE Books.AuthorID = Authors.ID 
AND Books.PublisherID = Publishers.ID;
Let’s see if we can produce our schema in an online playground so that we don’t have to install a database. DB Fiddle should do the job. If you enter the DDL and then add the actual data:
INSERT INTO Authors (Name, Birthday) 
VALUES ('Iain Banks', '1954-02-16'); 

INSERT INTO Authors (Name, Birthday) 
VALUES ('Iain M Banks', '1954-02-16'); 

INSERT INTO Publishers (Name, Address) 
VALUES ('Abacus', 'London'); 

INSERT INTO Publishers (Name, Address) 
VALUES ('Orbit', 'New York');
The results of looking at the view is shown below as ‘Query 3’ in DB Fiddle, and it is the data we wanted to see all along: 👁 Image

Can an LLM Also Create Schemas?

OK, so now we want to ask an LLM about creating schemas. To summarize how we want to guide the LLM:
  • When asked in English for a schema, we want it to generate the DDL for three tables, with indexes and constraints.
  • We can also hint at the need for constraints (primary keys, foreign keys, etc.) if we need to.
  • We can ask for a view.
  • We can nudge it toward MySQL syntax, if needed.
I’ll use Llama 3, but I also looked at OpenAI’s LLMs and got much the same results. Our first query: “Create a relational database schema to describe books, publishers and authors.” And the results: 👁 Image
So far so good. It hasn’t created the DDL, but we can ask that separately. It has in some ways done better by describing the schema in English. Let’s look at the rest of the reply: 👁 Image
It has described foreign key constraints and added the ISBN, which I didn’t think of. Also “PublicationDate” is better English than my “PublishedDate.” And it made one more table: 👁 Image
This solves the problem for multiple authors for one book — not a problem I was thinking about. The term bridge table indicates the joining of the two tables (books and authors) via foreign keys. Let’s ask for the DDL: “Show me the data definition language for this schema.” These came back correctly, including the NOT NULLs to ensure no empty entries. It also noted that the DDL was in some ways “generic” due to differences between vendor SQLs in the real world. Finally, let’s ask for a view: 👁 Image
This is a much more complicated version than mine; however, when adapted for my schema naming, it works just fine in DB Fiddle. The table alias naming seen here doesn’t really help with comprehension.

Conclusion: LLMs Can Indeed Do Schemas

I’d say this was a big win for LLMs, as they turned my English description into a well-constrained schema and then executable DDL, while also providing explanations (although those slipped into more technical relationship details). I didn’t even use a specialist LLM or service, so this worked out well. To some degree, this is a mapping of one domain (the publishing world) into another (the domain-specific language of SQL), and this works heavily to an LLM’s strengths. Each domain is well-defined and deep with detail. So, happy birthday SQL, and hopefully LLMs will keep you relevant for a few decades more!
TRENDING STORIES
David has been a London-based professional software developer with Oracle Corp. and British Telecom, and a consultant helping teams work in a more agile fashion. He wrote a book on UI design and has been writing technical articles ever since....
Read more from David Eastman
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: OpenAI.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.