VOOZH about

URL: https://dzone.com/articles/postgresql-random-test-data

⇱ Generate Random Test Data in PostgreSQL


Related

  1. DZone
  2. Data Engineering
  3. Databases
  4. Generate Random Test Data in PostgreSQL

Generate Random Test Data in PostgreSQL

Learn how to use PostgreSQL to generate random data for testing purposes, and use PL/pgSQL to automatically insert random values into your database tables.

By May. 01, 26 · Tutorial
Likes
Comment
Save
2.2K Views

Join the DZone community and get the full member experience.

Join For Free

When developing and testing applications that use a PostgreSQL database, it's often helpful to populate your tables with random data. Whether you're testing queries, performance, or database functionality, having a set of test data can help ensure your application performs as expected.

In this guide, we'll walk through how to create an anonymous PL/pgSQL block that generates random data and inserts it into a PostgreSQL table. The data will include various types such as integers, strings, dates, booleans, and UUIDs.

Why Use Random Data?

Random data is crucial in testing because it helps simulate real-world scenarios. For example:

  • Stress testing: Populate your tables with a large amount of data to see how your system performs under load.
  • Edge case testing: Generate random values that might help uncover issues with validation or boundaries.
  • Non-deterministic testing: Ensure your application works correctly regardless of the specific data used.

The PostgreSQL Code: Generating Random Data

The following steps outline how to write a PL/pgSQL block that generates and inserts random data into a PostgreSQL table:

1. Set Up Your PostgreSQL Table

First, make sure you have a table that you want to populate with random data. Here's an example of a simple table:

SQL
CREATE TABLE IF NOT EXISTS test_schema.test_tab2
(
 id BIGINT NOT NULL,
 fname VARCHAR(50),
 lname VARCHAR(50),
 create_date DATE,
 status BOOLEAN,
 CONSTRAINT test_tab1_pkey PRIMARY KEY (id)
);


This table includes:

  • An id (bigint)
  • A fname (string)
  • A lname (string)
  • A create_date (date)
  • A status (boolean)

2. Generate Random Data With PL/pgSQL

Now, we can write a PL/pgSQL anonymous block that generates random data and inserts it into the table. This script will:

  • Randomly generate values for each column based on the data type.
  • Insert a specified number of rows (in this case, 10).
  • Print the generated SQL statements for debugging and visibility.

Here’s the code:

SQL
DO $$
DECLARE
 rec_count INTEGER := 10; -- Limit to 10 records for testing
 col RECORD;
 col_list TEXT := '';
 val_list TEXT := '';
 sql_stmt TEXT;
 i INTEGER;
 tbl_schema TEXT := 'test_schema';
 tbl_name TEXT := 'test_tab2';
 random_date DATE;
 random_status BOOLEAN;
BEGIN
 -- Construct column names for insert statement
 FOR col IN
 SELECT column_name, data_type
 FROM information_schema.columns
 WHERE table_schema = tbl_schema
 AND table_name = tbl_name
 ORDER BY ordinal_position
 LOOP
 col_list := col_list || col.column_name || ', ';
 END LOOP;

 -- Trim trailing comma from column list
 col_list := left(col_list, length(col_list) - 2);

 -- Loop to insert rows
 FOR i IN 1..rec_count LOOP
 -- Initialize val_list for each row
 val_list := '';

 -- Loop through each column type to generate corresponding values for each row
 FOR col IN
 SELECT column_name, data_type
 FROM information_schema.columns
 WHERE table_schema = tbl_schema
 AND table_name = tbl_name
 ORDER BY ordinal_position
 LOOP
 -- Generate value for each column based on its data type
 CASE col.data_type
 WHEN 'bigint' THEN
 val_list := val_list || i || ', ';
 WHEN 'character varying' THEN
 val_list := val_list || quote_literal(col.column_name || '_' || i) || ', ';
 WHEN 'text' THEN
 val_list := val_list || quote_literal(col.column_name || '_' || i) || ', ';
 WHEN 'date' THEN
 -- Generate a random date between 2000-01-01 and 2009-12-31
 random_date := '2000-01-01'::date + trunc(random() * 366 * 10)::int;
 val_list := val_list || quote_literal(random_date) || ', ';
 WHEN 'boolean' THEN
 -- Generate a random boolean value (TRUE/FALSE)
 random_status := (i % 2 = 0); -- TRUE if even, FALSE if odd
 val_list := val_list || random_status || ', ';
 WHEN 'uuid' THEN
 val_list := val_list || 'gen_random_uuid(), ';
 ELSE
 val_list := val_list || 'NULL, ';
 END CASE;
 END LOOP;

 -- Trim trailing comma from val_list
 val_list := left(val_list, length(val_list) - 2);

 -- Prepare the SQL statement with dynamically generated values
 sql_stmt := format(
 'INSERT INTO %I.%I (%s) VALUES (%s);',
 tbl_schema, tbl_name,
 col_list,
 val_list
 );

 -- Print the SQL statement to the console
 RAISE NOTICE 'Executing: %', sql_stmt;

 -- Execute the SQL statement
 EXECUTE sql_stmt;

 -- Print confirmation of each inserted row
 RAISE NOTICE 'Inserted row % into %I.%I', i, tbl_schema, tbl_name;
 END LOOP;
END
$$;


How This Code Works

  • col_list: This variable dynamically collects the column names from the table schema.
  • val_list: For each row, this variable dynamically generates the values for each column, based on its data type (e.g., integers, strings, dates, booleans).
  • Random data generation:
    • Bigint: We use the row number (i) as a simple value for bigint columns.
    • Strings (fname, lname): We concatenate the column name with the row number (e.g., fname_1, lname_1).
    • Date: We generate a random date between 2000-01-01 and 2009-12-31 using the expression '2000-01-01'::date + trunc(random() * 366 * 10)::int.
    • Boolean: The status column is set to TRUE for even rows and FALSE for odd rows.
    • UUID: A random UUID is generated using gen_random_uuid().
  • SQL Statement Execution: The script then dynamically constructs an INSERT INTO SQL statement and executes it for each row, inserting the data into the table.

3. Executing the Code

After writing the code, you can run it in your PostgreSQL environment. The script will print the SQL INSERT statements as it executes, so you can verify what is being inserted.

4. Verifying the Results

You can use a simple SELECT Query to verify the random data was inserted:

SQL
SELECT * FROM test_schema.test_tab2;


This will display all the records that were inserted with the random data.


Benefits of Using This Method

  • Flexibility: The script can easily be modified to generate more rows or handle additional columns and data types.
  • Dynamic data generation: The data is dynamically generated based on the schema of the table, so no manual input is needed.
  • Realistic testing: By generating random values, you simulate a variety of real-world scenarios, making your tests more robust and reliable.

Conclusion

Generating random test data in PostgreSQL can be a powerful tool for developers and testers. Whether you’re building new features, performing load testing, or ensuring data integrity, using dynamic PL/pgSQL scripts to generate test data allows you to automate the process and focus on the logic of your application.

By following this guide, you can easily populate any PostgreSQL table with random data and streamline your testing and development process.

Database Test data Testing PostgreSQL

Opinions expressed by DZone contributors are their own.

Related

  • Database Connection Pooling at Scale: PgBouncer + Multi-Tenant Postgres (10K Concurrent Connections)
  • The New Testing Pattern: Standardizing Regression for Cloud Migrations
  • Unit Testing SQL Queries Across Multiple Database Platforms
  • Why PostgreSQL Vacuum Matters More Than You Think

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

Let's be friends: