Data Warehouse test concept
Testing a data warehouse system begins with the creation of a conscientious testing concept. What your test concept should include, which test methods and test tools are available to you, we show you below.
Create a meaningful test concept
To test your Data Warehouse system, there's no universal recipes. Every system and every project is individual and must therefore be tested individually. Sometimes there are technical conditions and sometimes it is just the budget that shows the limits. But in every case you should create a conscientious test concept from the beginning, using the following methods:
During development, individual pieces of code are tested for their correct functioning without regard to their dependencies. Thus can check, e.g. single data transformation tasks or even MDX formulas.
Do the individual components of the system work together properly? This can only be verified by providing them in a test environment and reviewing them across the board.
Data plausibility check
This process checks if the data is accurate and credible. By involving the business user you increase the value of this test method.
This type of testing has two objectives with respect to the user. On the one hand, it ensures that the data provided to the user meet the expectations. On the other hand, this ensures that the tools provided to the user meet the expectations.
Testing a function over and over again increases the certainty that no errors will creep in during development that affect an already tested part of the system. Adhering to short release cycles in agile projects is not possible without regression testing.
In addition, regression testing in the productive operation of the system can be very useful to regularly monitor the quality of the data, thereby providing an early warning system for any problems that occur.
This is how the life insurer tests Swiss Life
This testing process is an important aspect of user acceptance and therefore can not simply be ignored. Does loading of data lead to a slowdown of the systems? Does the parallel queriy of data lead to a slowdown of the systems?
Specific tests for your data
Compare number of records
Compare the number of data records in the source systems with those in the data warehouse for a first overview.
Which records are missing or too much respectively duplicated you only learn when you review business keys.
Validate source data
Do not allow erroneous data to be loaded into your data warehouse by checking that the data provided is error-free, even during staging.
Check scopes of validity
Would you load a customer record into your data warehouse that has a date of birth around 1850? Validate information that is subject to restrictions early.
Be sure to check, if your implementation of historicized data in Slowly Changing Dimensions (SCD 2) is correct.
Are there any gaps? Are there any overlaps or date rotators?
Classifiable data should be checked for their distribution. Were around 40% of your sales made on product group A and only 10% on product group B?
Check aggregates and KPI's
You can mathematically check business logic in the data warehouse or in analysis systems such as OLAP cubes and, if necessary, compare them with values from your source systems.
If you involve business users to your tests, you'll benefit of their knowledge. Check, for example, whether the November and December sales differ in a credible manner.
Define reference queries or record real queries from your users to constantly monitor their performance.
BiG EVAL takes over the manual testing of a ETL-process and data warehouse components. This happens through purposeful checking and comparing of data from source systems, the data warehouse, staging areas and all other components. Just as you would do it manually - but just automated.