VOOZH about

URL: https://www.javacodegeeks.com/2015/03/how-to-use-sql-pivot-to-compare-two-tables-in-your-database.html

⇱ How to use SQL PIVOT to Compare Two Tables in Your Database - Java Code Geeks


This can happen ever so easily. You adapt a table by adding a new column:
 
 
 
 
 
 
 
 
 
 

ALTER TABLE payments ADD code NUMBER(3);

You go on, implementing your business logic – absolutely no problem. But then, later on (perhaps in production), some batch job fails because it makes some strong assumptions about data types. Namely, it assumes that the two tables payments and payments_archive are of the same row type:

CREATE TABLE payments
 (
 id NUMBER(18) NOT NULL,
 account_id NUMBER(18) NOT NULL,
 value_date DATE,
 amount NUMBER(25, 2) NOT NULL
 );

CREATE TABLE payments_archive
 (
 id NUMBER(18) NOT NULL,
 account_id NUMBER(18) NOT NULL,
 value_date DATE,
 amount NUMBER(25, 2) NOT NULL
 );

Being of the same row type, you can simply move a row from one table to the other, e.g. using a query like this one:

INSERT INTO payments_archive
SELECT * FROM payments
WHERE value_date < SYSDATE - 30;

(not that using the above syntax is a good idea in general, it’s actually a bad idea. but you get the point)

What you’re getting now is this:

ORA-00913: too many values

The fix is obvious, but probably, the poor soul who has to fix this is not you, but someone else who has to figure out among possibly hundreds of columns, which ones don’t match. Here’s how (in Oracle):

Use PIVOT to compare two tables!

You could of course not use PIVOT and simply select all columns from either table from the dictionary views:

SELECT 
 table_name,
 column_name
FROM all_tab_cols
WHERE table_name LIKE 'PAYMENTS%'

This will produce the following result:

TABLE_NAME COLUMN_NAME 
------------------ ---------------
PAYMENTS ID 
PAYMENTS ACCOUNT_ID 
PAYMENTS VALUE_DATE 
PAYMENTS AMOUNT 
PAYMENTS CODE 
PAYMENTS_ARCHIVE ID 
PAYMENTS_ARCHIVE ACCOUNT_ID 
PAYMENTS_ARCHIVE VALUE_DATE 
PAYMENTS_ARCHIVE AMOUNT

Not very readable. You could of course use set operations and apply INTERSECT and MINUS (EXCEPT) to filter out matching values. But much better:

SELECT *
FROM (
 SELECT 
 table_name,
 column_name
 FROM all_tab_cols
 WHERE table_name LIKE 'PAYMENTS%'
) 
PIVOT ( 
 COUNT(*) AS cnt
 FOR (table_name) 
 IN (
 'PAYMENTS' AS payments, 
 'PAYMENTS_ARCHIVE' AS payments_archive 
 ) 
) t;

And the above now produces:

COLUMN_NAME PAYMENTS_CNT PAYMENTS_ARCHIVE_CNT
------------ ------------ --------------------
CODE 1 0 
ACCOUNT_ID 1 1 
ID 1 1 
VALUE_DATE 1 1 
AMOUNT 1 1

It is now very easy to identify the column that is missing from the PAYMENTS_ARCHIVE table. As you can see, the result from the original query produced one row per column AND per table. We took that result and pivoted it “FOR” the table name, such that we will now only get one row per column

How to read PIVOT?

It’s easy. Comments are inline:

SELECT *

-- This is the table that we're pivoting. Note that
-- we select only the minimum to prevent side-effects
FROM (
 SELECT 
 table_name,
 column_name
 FROM all_tab_cols
 WHERE table_name LIKE 'PAYMENTS%'
) 

-- PIVOT is a keyword that is applied to the above
-- table. It generates a new table, similar to JOIN
PIVOT (

 -- This is the aggregated value that we want to
 -- produce for each pivoted value
 COUNT(*) AS available 

 -- This is the source of the values that we want to
 -- pivot
 FOR (table_name) 

 -- These are the values that we accept as pivot
 -- columns. The columns names are produced from
 -- these values concatenated with the corresponding
 -- aggregate function name
 IN (
 'PAYMENTS' AS payments, 
 'PAYMENTS_ARCHIVE' AS payments_archive 
 ) 
) t;

That’s it. Not so hard, was it?

The nice thing about this syntax is that we can generate as many additional columns as we want, very easily:

SELECT *
FROM (
 SELECT 
 table_name,
 column_name,
 cast(data_type as varchar(6)) data_type
 FROM all_tab_cols
 WHERE table_name LIKE 'PAYMENTS%'
) 
PIVOT ( 
 COUNT(*) AS cnt,
 MAX(data_type) AS type -- new function here
 FOR (table_name) 
 IN (
 'PAYMENTS' AS p, 
 'PAYMENTS_ARCHIVE' AS a
 ) 
) t;

… producing (after additional erroneous DDL) …

COLUMN_NAME P_CNT P_TYPE A_CNT A_TYPE
----------- ---------- ------ ---------- ------
CODE 1 NUMBER 0
ACCOUNT_ID 1 NUMBER 1 NUMBER 
ID 1 NUMBER 1 NUMBER 
VALUE_DATE 1 DATE 1 TIMESTAMP
AMOUNT 1 NUMBER 1 NUMBER

This way, we can discover even more flaws between the different row types of the tables. In the above example, we’ve used MAX(), because we have to provide an aggregation function, even if each pivoted column corresponds to exactly one row in our example – but that doesn’t have to be.

What if I’m not using Oracle?

SQL Server also supports PIVOT, but other databases don’t. You can always emulate PIVOT using GROUP BY and CASE. The following statement is equivalent to the previous one:

SELECT
 t.column_name,
 count(CASE table_name 
 WHEN 'PAYMENTS' THEN 1 END) p_cnt,
 max (CASE table_name 
 WHEN 'PAYMENTS' THEN data_type END) p_type,
 count(CASE table_name 
 WHEN 'PAYMENTS_ARCHIVE' THEN 1 END) a_cnt,
 max (CASE table_name 
 WHEN 'PAYMENTS_ARCHIVE' THEN data_type END) a_type
FROM (
 SELECT 
 table_name,
 column_name,
 data_type
 FROM all_tab_cols
 WHERE table_name LIKE 'PAYMENTS%'
) t
GROUP BY
 t.column_name;

This query will now produce the same result on all the other databases as well.

Isn’t that… ?

Yes, it is! The above usage of aggregate functions in combination with CASE can be shortened even more, using the SQL standard FILTER clause, which we’ve blogged about recently.

So, in PostgreSQL, you could write the following query:

SELECT
 t.column_name,
 count(table_name) 
 FILTER (WHERE table_name = 'payments') p_cnt,
 max(data_type) 
 FILTER (WHERE table_name = 'payments') p_type,
 count(table_name) 
 FILTER (WHERE table_name = 'payments_archive') a_cnt,
 max(data_type) 
 FILTER (WHERE table_name = 'payments_archive') a_type
FROM (
 SELECT 
 table_name,
 column_name,
 data_type
 FROM information_schema.columns
 WHERE table_name LIKE 'payments%'
) t
GROUP BY
 t.column_name;
Do you want to know how to develop your skillset to become a Java Rockstar?
Subscribe to our newsletter to start Rocking right now!
To get you started we give you our best selling eBooks for FREE!
1. JPA Mini Book
2. JVM Troubleshooting Guide
3. JUnit Tutorial for Unit Testing
4. Java Annotations Tutorial
5. Java Interview Questions
6. Spring Interview Questions
7. Android UI Design
and many more ....
I agree to the Terms and Privacy Policy

Thank you!

We will contact you soon.

👁 Photo of Lukas Eder
Lukas Eder
March 4th, 2015Last Updated: February 28th, 2015
0 162 4 minutes read

Lukas Eder

Lukas is a Java and SQL enthusiast developer. He created the Data Geekery GmbH. He is the creator of jOOQ, a comprehensive SQL library for Java, and he is blogging mostly about these three topics: Java, SQL and jOOQ.
Subscribe

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Back to top button
Close
wpDiscuz