Automated DataWarehouse Testing
Build or Buy a Data Quality Solution?
5 Considerations to make a Decision
In this article, we're going to explore the topic of 'Build vs Buy' as it applies to data quality and testing assurance, sharing practical advice on which path to take.
As the demand for better data increases, so does the call for data integrity.
For optimal decision-making, process execution and regulatory compliance, the business demands the right data, at the right time, and of the right quality.
You need to assess, monitor and assure data quality across its entire data lifecycle – starting from the source systems and on to any downstream processing and analytical platforms, such as a data warehouse or data mart.
But this need for end-to-end data quality management poses a question for many organizations:
Should we buy a data automation testing / data quality assessment solution or build our own?
What preparatory considerations do you need to make?
Before you start looking at whether to build a data quality testing platform internally, or procure a commercial product, let's consider some of the preparation activities required. If you skip these, you'll have an incomplete set of criteria on which to evaluate your decision.
Let's use the example of an organization looking to assure the quality of its Business Intelligence data, as this is one of the most common use cases we see for data quality testing and assurance.
5 Considerations to make a Decision:
Consideration #1: What are the known data quality issues that data users have already raised?
Chances are, you're considering a data quality testing solution because there are concerns about data integrity.
For example, a past client reached out to us after senior management had become frustrated at ongoing discrepancies between financial reports produced within their Business Intelligence platform. Poor data quality had eroded the trust in the financial reporting system, causing some managers to revert to Excel, undoing the investment and confidence placed in the new architecture.
Therefore, your first step is to catalogue the type of data quality issues already raised to give you a better idea of what solution could resolve them.
Consideration #2: What are the impacts of not assuring data quality?
As organisations swap Excel and manual reporting for more sophisticated, automated data analytics solutions, the speed at which errors can proliferate across the organisation dramatically increases.
You need to consider the likely impact of poor data quality based on the issues you've already witnessed and the possible outcome of any future problems.
Many managers assume that their data is somehow assured and governed because they have a sophisticated data analytics platform.
Sadly, this is not the case.
A modern data platform has thousands of processes and hundreds of system interfaces, any of which could introduce an error that poses a significant risk to the organization.
It's vital that you fully understand the impact of poor quality data, both now and in the future. In doing so, you will clarify the ROI of implementing a robust data quality assurance capability.
Consideration #3: What is a reasonable timeline for putting a data assurance capability in place?
If data integrity/quality issues are apparent, there may be some urgency around building a data assurance capability. Therefore, it's worth clarifying how soon your stakeholders expect a solution to be in place.
The reality is that in-house software development projects will take significantly longer to implement. Therefore the urgency of timelines needs to be considered carefully in advance.
Consideration #4: What type of similar data solutions have been created internally before? What was the outcome?
Building a data assurance platform is far more complex than a typical software development project.
Your organization may have an internal IT team capable of building applications and simple data processes, but managing data quality across diverse data landscapes presents significant technical barriers that most IT teams have never encountered.
Note: You will learn some of these challenges later in the article.
If your organization has a history of creating similar, complex data assurance and monitoring solutions in the past, it's worth examining these to get an accurate understanding of the resource and costs consumed. You also want to clarify how effective these projects were at delivering against expectations.
If your organization has no history of developing similar data solutions, it's unlikely the IT team will possess the advanced data and software skills required.
Consideration #5: What is the current and future scope of assurance?
Approximately 40% of the organizations that enquire about our data quality and assurance solution are already considering the option to 'self-build' their data quality and testing capability.
Many have already gone down the self-build path and are reaching out for a commercial alternative after experiencing a negative outcome with their internally developed software.
One of the common difficulties with home-grown solutions is 'scope creep. The internal software can appear to cope reasonably well when the scope is minimal, e.g. some simple data feeds and basic quality checks. But as demand for further assessment and monitoring grows, cracks soon appear.
It's important to realise that any solution needs to scale with forecasted demand. If you're assuring a Business Intelligence platform, you can expect substantially more data feeds (inbound and outbound), not to mention hundreds (or even thousands) of additional reports over time. Bear in mind that you can also expect other departments and use cases to emerge that fall outside of the traditional data quality/testing requirements. Such use cases can include internal audits, regulatory reporting, user acceptance testing, and many more.
Remember that internally developed software will almost always lack the scalability of a modern, custom-built assurance tool. Be realistic about the future growth and demand for data quality assurance and automated testing.
Understanding the Phases of Business Intelligence (BI) Testing Maturity
Keeping to our use case of an organization looking to test and assure their Business Intelligence, it helps to understand where your organization sits on the maturity curve of testing/quality assurance.
Using the image below, you can gauge your current maturity level.
Over the many years we've delivered data testing and data quality assurance solutions, we found that most organisations implementing tactical or self-help solutions tend to have a weaker overall strategy for data quality and data testing automation, thereby significantly increasing costs and risks to the business.
We observed that companies investing in appropriate technology are more likely to have a more robust and long-term strategy for data automation testing and data quality assurance, which significantly increases the value from their investment.
We found that organizations who successfully scaled their data testing and data quality capability have nearly always done so through the adoption of a commercial solution as the foundation for expansion within the organization.
Build vs Buy - What are the essential factors when making a decision?
Now we've covered what to consider when preparing to make a decision, let's consider the merits of buy vs build as it relates to data testing automation and data quality assurance.
The Cost Equation
One of the most compelling drivers that we see for building a 'home-grown' solution is the perceived lack of budget for buying a commercial product.
There is a misconception that given limited funding, it makes sense to develop a solution internally to avoid the upfront capital outlay of a commercial tool.
Sometimes, there are budgets allocated for internal IT development, but not external capital investment. If you have an IT team of developers waiting for coding work, why not get them busy building a solution? When your organization has a freeze on capital procurement, developing an internal product could be your only path to a resolution.
However, one consideration is that many solution providers (BiG EVAL included) significantly reduce capital outlay by offering a leased model. Our research found that the annual cost of 'renting' a commercial solution is lower than the cost of developing, testing, supporting, and maintaining an in-house alternative.
For example, we discovered one organization had hired a developer/consultant to build a data automation testing framework and software solution. After several years, they reached out for help. Working with their team, we replaced the legacy solution in less than 12 weeks and at a fraction of the cost spent on their previous approach.
When building an internal solution, you don't just have to consider the cost of the initial project. There are longer-term maintenance and support budgets required that often get overlooked.
We have also found that internally built software often lacks standardization, making it more difficult to deploy across different use cases, technologies, and business units. With a commercial solution, such as BiG EVAL, we often see our technology exploited across a range of technical and business-focused initiatives. This 'shareability' factor of the latest commercial software also means funding and investment can be spread across departmental budgets, reducing the impact on one team.
In addition, we have found that commercial solutions provide more predictability around future spend than internal solutions. With internally created software, the costs are difficult to predict because it's unclear how the maintenance and development costs will rise as the scope and demand for data assurance increases.
Productivity and Assurance Throughput
Perhaps the most apparent difference between 'buy and build' that we see is the 'Same-Day Productivity' benefit of a solution like BiG EVAL.
This stark difference in productivity was apparent with a recent client who had previously carried out all their data assurance via SQL (a database querying language) and a custom piece of internal software designed to schedule and execute scripts.
The client confided that it would take many days to provide a new data assurance process whenever the business requested a report or dashboard from a new data source. The end-to-end process resulted in numerous handoffs between technical and business teams before being deployed into a production environment.
From there, even the most minor update would require several days of turnaround from internal IT, resulting in further productivity bottlenecks.
With commercial software such as BiG EVAL, we regularly observe our clients implementing a new assurance process on the same day it was requested.
This emergence of 'Same Day Productivity' has obvious cost savings for the IT team, but the real benefit is the productivity gain for the business.
Each time your data platform is monitored for data quality, you're targeting issues that will have a damaging impact on the business if ignored. With commercial data assurance tools, not only are you increasing the productivity of the IT team, but you’re also transforming the performance of your data analytics and business services teams as well.
With a commercial solution, our experience has shown that you're able to deploy far more data assurance processes in a fraction of the time compared to custom-built internal solutions.
We're often asked to help organizations mature their data testing and assurance processes after they've already got started with a home-grown software solution. In practically every case, we find that the software they've created has reached a 'dead-end' in terms of its agility to cope with the modern demands of data assurance and data quality testing.
It's easy to think that data assurance involves simple 'data checks' to see if a value has been entered correctly or a process has completed as per requirement. If you only need some basic validations, then home-grown validation solutions may suffice for a tightly defined scope with a simple data source and processing chain.
But the reality is quite different for most organizations.
We've specialized in data warehouse and Business Intelligence data assurance testing for many years, and the diagram below highlights the most common data assurance testing use cases we encounter. The seven capabilities also indicate the 'functionality portfolio' that we provide with our solution.
This image demonstrates that 'data testing' has evolved substantially over the years. The modern data landscape is complex, varied and constantly changing. When you take on the burden of developing data testing and data quality assurance internally, you commit to tackling the present data landscape and building a roadmap for whatever the future holds. Sustaining that roadmap places a great deal of pressure on an average IT team rarely equipped to deal with the full scope of functionality required.
Perhaps the most compelling argument for building your own software comes from the point of supply and demand.
If your internal business and technical teams demand certain functionality that commercial software providers can't supply, it makes sense to custom build your own solution to meet internal demand.
However, supply is no longer an issue.
Modern, fully-featured technologies, such as BiG EVAL, address all the typical use-cases required for assuring and testing data.
Whether you require an end-to-end Business Intelligence data assurance solution or another form of in-flight data quality assessment, modern commercial solutions cost less than in-house development, deliver superior results and provide far greater productivity.
Are you looking to compare the cost of building your own data testing solution against a commercial alternative?
Book a session with BiG EVAL, and we'll help you calculate the total cost of ownership of our solution compared to the development and support costs of a traditional in-house build.