VOOZH about

URL: https://dzone.com/articles/sample-data-generation-with-built-in-database-capa

⇱ Sample Data Generation With Built-In Database Capabilities


Related

  1. DZone
  2. Data Engineering
  3. Databases
  4. Sample Data Generation With Built-In Database Capabilities

Sample Data Generation With Built-In Database Capabilities

Learn how to use special functions, hierarchical queries, and common table expressions to generate sample data in PostgreSQL, Oracle, MySQL, and SQL Server.

By Nov. 10, 23 · Tutorial
Likes
Comment
Save
4.7K Views

Join the DZone community and get the full member experience.

Join For Free

There are many ways to generate dummy data for a database. You can create a data generator, use a mock data service, or get a subset of your production data. Additionally, many databases offer built-in capabilities for producing synthetic data.

In this article, you’ll learn how to use special database functions, hierarchical queries, and recursive Common Table Expressions (CTEs) to generate a sample dataset in PostgreSQL, MySQL, Oracle, and SQL Server.

Sample Table

We'll use the following sample table to demonstrate various built-in data generation techniques:

SQL
create table sample(
 id int, 
 val text
);


The id column stores the record's identifier, while the val column holds some text value.

Once you learn how to generate mock data for the table with two columns, you'll be able to apply the techniques to tables with many more columns of various data types.

Generate Series Function in PostgreSQL

PostgreSQL offers a special generate_series function that can produce a set of numbers or date/time values for a specified range.

For instance, to generate a series of IDs from 1 to 5, you would use the following function call:

SQL
select id from generate_series(1, 5) as id;

+--+
|id|
+--+
|5 |
|4 |
|3 |
|2 |
|1 |
+--+


Afterward, you can insert the generated series into the sample table with this simple statement:

SQL
insert into sample(id)
select id from generate_series(1, 5) as id;


This query adds five records to the table, each with a unique identifier. However, the val column will still be null:

SQL
select * from sample order by id;

+--+----+
|id|val |
+--+----+
|1 |null|
|2 |null|
|3 |null|
|4 |null|
|5 |null|
+--+----+


To populate the val column with non-null values, you simply need to modify the data generation query as follows:

SQL
-- Delete previously generated records
delete from sample;

-- Generate 1000 records using the concat function to populate the `val` column
insert into sample(id,val)
select id, concat('val', id * 10) from generate_series(1, 1000) as id;


Finally, retrieve the first five records from the table to review the sample data:

SQL
select * from sample order by id limit 5;

+--+-----+
|id|val |
+--+-----+
|1 |val10|
|2 |val20|
|3 |val30|
|4 |val40|
|5 |val50|
+--+-----+


Quick and simple. For more advanced data generation scenarios using the generate_series function, refer to this article.

Generate Series Again ... But in SQL Server

The latest versions of SQL Server have introduced support for the generate_series function. Consequently, whether you're using PostgreSQL or SQL Server, the experience of generating sample data is consistent.

To generate a series of numbers from 1 to 5, you would invoke the function as follows:

SQL
select value from generate_series(1,5);

+-----+
|value|
+-----+
|1 |
|2 |
|3 |
|4 |
|5 |
+-----+


Subsequently, execute the following command to create 1000 records in the sample table:

SQL
insert into sample(id,val)
select value,concat('val', value * 10) from generate_series(1,1000);


To check the generated data, query the first five records:

SQL
select * from sample order by id
offset 0 rows
fetch next 5 rows only;

+--+-----+
|id|val |
+--+-----+
|1 |val10|
|2 |val20|
|3 |val30|
|4 |val40|
|5 |val50|
+--+-----+


As demonstrated, the generate_series function is a powerful tool for data generation tasks. However, this function is not part of the SQL standard yet and may not be available in all database systems.

Hierarchical Queries in Oracle

Oracle is among the databases that do not support the generate_series function. However, the database community has devised many alternative methods for generating dummy data.

One popular method involves using hierarchical queries. For instance, the following hierarchical query can generate a series of records from 1 to 5:

SQL
select level from dual connect by level <= 5;

+-----+
|LEVEL|
+-----+
|1 |
|2 |
|3 |
|4 |
|5 |
+-----+


Internally, the query constructs a data tree structure, with the LEVEL pseudo-column indicating the depth of the tree, beginning with one for the root.

By using this hierarchical query with an insert statement, you can produce 1000 records for the sample table:

SQL
-- Oracle doesn't support the `text` data type,
-- requiring you to create the table this way
create table sample (id int, val varchar(10));

-- Generate 1000 records
insert into sample(id,val)
select level, concat('val', level * 10) from dual 
connect by level <= 1000;


The generated data will resemble what you've seen with PostgreSQL and SQL Server:

SQL
select * from sample order by id
offset 0 rows fetch next 5 rows only;

+--+-----+
|ID|VAL |
+--+-----+
|1 |val10|
|2 |val20|
|3 |val30|
|4 |val40|
|5 |val50|
+--+-----+


Recursive Common Table Expressions in MySQL

Like Oracle, MySQL does not support the generate_series function, which necessitates finding alternative methods for generating sample data.

One such method is using recursive common table expressions (CTEs). To illustrate, the following recursive query generates a series of numbers from 1 to 5:

SQL
with recursive seq as (
 select 1 as id union all select id +1 from seq where id < 5
)
select id from seq;

+--+
|id|
+--+
|1 |
|2 |
|3 |
|4 |
|5 |
+--+


Next, you can use this recursion together with the following insert statement to produce 1000 records:

SQL
insert into sample(id,val)
with recursive seq as (
 select 1 as id union all select id +1 from seq where id < 1000
)
select id, concat('val', id * 10) from seq;


Finally, to ensure the mock data has been properly generated, take a quick look at the first five records:

SQL
select * from sample order by id limit 5;

+--+-----+
|id|val |
+--+-----+
|1 |val10|
|2 |val20|
|3 |val30|
|4 |val40|
|5 |val50|
+--+-----+


Keep Mastering Databases 

As you can see, relational databases are much more than just storage for your application data. They provide a broad set of capabilities, allowing you to perform various tasks right on the database end. One such task is sample data generation, which can sometimes be satisfied with built-in database features.

Stay tuned for more database-related content, and keep mastering your database skills.

In the meantime, you can watch this video to see how the discussed data generation techniques work in practice for much larger tables.


Dummy data MySQL Relational database Synthetic data sql PostgreSQL

Opinions expressed by DZone contributors are their own.

Related

  • 5 Key Postgres Advantages Over MySQL
  • Useful System Table Queries in Relational Databases
  • Java EE 6 Pet Catalog with GlassFish and MySQL
  • Common Mistakes to Avoid When Writing SQL Code

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

Let's be friends: