DataWarehouse Testing
Is a SQL Minus Query helpful for efficient data validation?
Learn about SQL Minus Queries in data validation. Are they efficient? Are they reliable? Do they make sense in your specific scenario?
What is a SQL Minus Query?
A SQL Minus Query helps you subtracting one dataset from another. The result is the difference between these two datasets. Or in other words, what's left from the first dataset, if you subtract the second dataset.
The following graphic reveals the exact way it works.
How to write a SQL Minus Query?
Each SQL syntax provides a MINUS operator that can be used to implement this kind of data subtraction.
Let's assume we have the following two tables. These are our datasets we want to substract from each other. The obvious differences are marked in green and yellow.
Table1
Table2
The MINUS operator allows us to write the following two queries to get either the green records or the yellow records.
Getting the rows that only exist in Table1
(SELECT Id, Name FROM Table1)
MINUS
(SELECT Id, Name FROM Table2)
The result is the following, as only the row that is marked in green doesn't exist in Table2.
Getting the rows that only exist in Table2
(SELECT Id, Name FROM Table2)
MINUS
(SELECT Id, Name FROM Table1)
The result is the following, as only the rows that are marked in yellow do not exist in Table1.
FREE eBook
You want a Simple and Reliable Data Test Concept?
Creating a Winning
Data Warehouse Test Concept
Download our latest eBook and learn how to create a simple yet effective testing strategy.
How to use this in Data Validation Scenarios?
You may wonder how we can utilize the SQL MINUS Query in data validation scenarios. Here are some examples:
Detecting missing Records
By subtracting two lists of Id's - each coming from table1 and table2 - we can find out, whether all records from Table1 were copied to Table2. This may be helpful in a data integration process where we read data from Table1 (=ERP-System) and integrate them into Table2 (=DWH-System) using an ETL-Process.
Detecting additional Records
Doing the same as described in the previous example with subtracting the Id's of the ERP from the Id's of the DWH, you can detect rows that are present in the DWH, but not in your source system. There are some rare scenarios where this may make sense.
Detecting records with differences in specific fields
By utilizing a MINUS query you can also detect records, that were changed for the bad during the transformation process. Let's assume, Emma's name was changed to Ema (just one 'm'). The record is still present, but it's simply not correct. The MINUS-query would return this records, because it gets considered as differently.