DataWarehouse Testing

Is a SQL Minus Query helpful for efficient data validation?

Learn about SQL Minus Queries in data validation. Are they efficient? Are they reliable? Do they make sense in your specific scenario?

What is a SQL Minus Query?

A SQL Minus Query helps you subtracting one dataset from another. The result is the difference between these two datasets. Or in other words, what's left from the first dataset, if you subtract the second dataset.

The following graphic reveals the exact way it works.

There are several ways to apply a Minus-Query depending on the order of the datasets within the query.
Dataset 1 MINUS Dataset 2 = Area A
Dataset 2 MINUS Dataset 1 = Area B

How to write a SQL Minus Query?

Each SQL syntax provides a MINUS operator that can be used to implement this kind of data subtraction.

Let's assume we have the following two tables. These are our datasets we want to substract from each other. The obvious differences are marked in green and yellow.

Table1

Id

Name

1

Felipe

2

Andrea

3

Sebastian

4

Emma

Table2

Id

Name

1

Felipe

2

Andrea

3

Sebastian

4

Mario

5

Nathan

The MINUS operator allows us to write the following two queries to get either the green records or the yellow records.

Getting the rows that only exist in Table1

(SELECT Id, Name FROM Table1)
  MINUS
(SELECT Id, Name FROM Table2)

The result is the following, as only the row that is marked in green doesn't exist in Table2.

Id

Name

4

Emma

Getting the rows that only exist in Table2

(SELECT Id, Name FROM Table2)
  MINUS
(SELECT Id, Name FROM Table1)

The result is the following, as only the rows that are marked in yellow do not exist in Table1.

Id

Name

4

Mario

5

Nathan

How to use this in Data Validation Scenarios?

You may wonder how we can utilize the SQL MINUS Query in data validation scenarios. Here are some examples:

Detecting missing Records

By subtracting two lists of Id's - each coming from table1 and table2 - we can find out, whether all records from Table1 were copied to Table2. This may be helpful in a data integration process where we read data from Table1 (=ERP-System) and integrate them into Table2 (=DWH-System) using an ETL-Process.

Detecting additional Records

Doing the same as described in the previous example with subtracting the Id's of the ERP from the Id's of the DWH, you can detect rows that are present in the DWH, but not in your source system. There are some rare scenarios where this may make sense.

Detecting records with differences in specific fields

By utilizing a MINUS query you can also detect records, that were changed for the bad during the transformation process. Let's assume, Emma's name was changed to Ema (just one 'm'). The record is still present, but it's simply not correct. The MINUS-query would return this records, because it gets considered as differently.

Drawbacks of SQL Minus Queries

  • Developers are needed to write them.
  • They only show the differences of either the left or the right table.
  • Rows with differences are shown the same way as completely missing rows.
  • Minus-Querying tables from physically different data sources is not possible.

Do the first step! Get in touch with BiG EVAL...


FREE eBook

eBook Successfully Implementing Data Quality Improvements

Successfully Implementing
Data Quality Improvements

This free eBook unveils one of the most important secrets of successful data quality improvement projects.