How to build an operational data quality process

How to build an operational data quality process

In a recent article, we outlined 'How to create a DataOps process for Data Quality and Data Testing'.Our goal with that article was to share an architectural framework and process for DataOps within a typical data analytics and warehousing environment where data quality and testing were critical.

In this article " How to build an operational data quality process" , we're going to drill a little deeper into what it takes to build a fully operational data quality team and function capable of managing the end-to-end
  Data Quality Management Process.

Making a case for data quality

Your first challenge is likely to be securing funding and support for data quality management. Many leaders will assume that the initial system testing phase will have been sufficient to identify any data defects.

In reality, data quality issues can arise at any point in a system's operational life cycle, so they must be continuously managed and resolved.

Some organisations with a more mature and established approach to data management may already have a 'Data Quality Centre of Excellence' or dedicated data quality team. 

If not, it's OK to commence a data quality management initiative within an area such as a data analytics or data warehouse environment because this will form an excellent access point for assessing large volumes of data from across the business. 

Any defects discovered can be shared with the upstream business units, helping to tackle the root cause of the issue whilst providing considerable operational savings.

Many organisations use their experience managing data quality within a data analytics environment as a platform for launching a much broader, enterprise-wide data quality capability.

Value in - Value out

Getting your 'data plumbing' in order

Before commencing with your operational data quality process, you need to define the scope of where you intend to start. 

Depending on the complexity and scale of the data in your analytics environment, there may be hundreds of data sources flowing into your warehousing systems and analytics platforms.

The first thing your team will need to gather is an accurate view of where the data is flowing in and out of the 'in scope' data landscape. From there, you can start to identify some of the more critical data to commence the initial data quality management process.

The Data Quality Management Lifecycle

The typical functions of your data quality team, or designated team members responsible for data quality, will consist of:

Data Quality Rule Discovery and Specification

Data Quality Rule Assessment and Monitoring

Data Quality Defect Resolution and Improvement

Data Quality Reporting and Planning

Data Quality Rule Discovery and Specification

Your team will first be tasked with building a 'library' of data quality rules to manage the initial in-scope data identified in the earlier scoping review. 

A typical data quality library should provide several benefits:

  • Provide a central, easy-to-use interface, allowing quick and robust data quality rule creation

  • Enable changes and additions to be audited and logged (particularly relevant for regulatory control and data governance)

  • Provide the data quality specifications for executing the assessment and monitoring tests that will follow in a later stage

Creating data quality rules requires a blend of technical and business expertise. For example, some rules may specify specific ranges or predefined sets of acceptable values based on business events. 

All of this means that whoever creates the rules must have the confidence to engage with the business community to ask the right questions and translate the responses into well-defined rule specifications.

Important note: Historically, many organisations would code their data quality rules using complex and technical coding languages, making them difficult to manage and share. But today, with the advent of modern technology (such as our BiG EVAL solution), organisations can now create a flexible, easy-to-use library of rules that can be shared and re-used across the entire business.

Data Quality Rule Assessment and Monitoring

At this point, your team should have their initial library of data quality rules in place for the data you prioritised as in-scope.

The next step is to run an initial assessment to observe how many defects are discovered. The team will also need to fine-tune the rules to trap the most severe defects whilst providing warnings for lesser issues that, although a hindrance, may have less operational impact.

One of the concepts your team should introduce is the concept of a 'Data Quality Firewall' around your analytics data landscape or whichever data platforms you are assessing. The goal of a Data Quality Firewall is to ensure that poor quality data is tracked and blocked before entering or leaving the analytics platform (or other data locations).

To achieve a Data Quality Firewall, you'll need to ensure that the data quality tool you use is capable of performing two types of data quality assessment:

1. Source Data Quality Trapping/Gating

2. Real-Time Data Quality Monitoring

The idea with 'gating' data quality is that any source systems can make calls to your data quality libraries to validate data before it even enters a system or information chain. 

Gating data quality (with pre-defined rules from your library) is the 'holy grail' of data quality management because it traps a defect before it can cause havoc moving across the different systems found within a typical organisation.

With real-time monitoring, you're ensuring that any 'in-flight' processes that depend on high-quality information can trap defects during a live operational process.

Data Quality Defect Resolution and Improvement

Once data defects have been identified, you then need to decide what outcome to take, which will typically involve several options:

Option 1: Cleanse or scrub the data during an operational process

Where a repeating defect is difficult to prevent, you may elect to clean up the data by transforming it to a correct value. Sometimes you may wish to go back in time and fix any historical data, ideally at the source system.

Option 2: Improve the data validation and ingestion/creation process

When you've identified the source of a data defect, you may elect to improve the data ingestion or creation process so that any future defects are spotted and prevented in advance.

A modern data quality tool (such as BiG EVAL) should allow you to call data quality rules that provide ingestion validation routines that block poor quality data at the source. 

The benefit of using a shared library of rules is that each rule (such as a check for valid product codes) can be reused across the entire organisation instead of having to manually code separate rule logic in each different application, requiring considerably more development and maintenance resource.



Data Quality Reporting and Planning

Finally, your data quality team will be responsible for providing daily, weekly and monthly reports of data quality progress. These reports will need to be tailored to the different audiences within the business and technical communities.

For example, IT staff responsible for managing data pipelines in a data analytics platform may need 24/7 reporting that highlights current defect rate, source and impact, so that any data movement tasks can be repeated or re-scheduled.

Senior business leaders may simply want to see the defect rate go down and the value to the business go up.

Some tools (such as BiG EVAL DQM) already have built-in management and technical reporting capabilities, so be sure to check that whatever operational data quality software you consider can support this critical activity.

During the planning activity, you'll be gathering data from the data quality reports and improvement work carried out and identifying where next to prioritise future assessment, monitoring and improvement activities.

A key aspect of data quality planning is identifying critical strategic goals or known 'data hotspots' where data quality assessment and improvement could have immediate and long-term gains. If you're leading the initiative, you'll be working with a mix of technical, operations and leadership teams to identify where next to extend your focus, what resources will be required, and what a likely roadmap for implementation would look like.

What team structure do we need for operationalising data quality management?

It's often assumed that you'll need to hire new staff and expertise to build your operational data quality management capability, but that's not the case. 

In most of our engagements with clients that utilise BiG EVAL for data quality testing, assessment and monitoring, they've been able to reallocate existing staff to perform all necessary data quality functions.

Here's a round-up of the typical skillsets you'll need for each phase:


Data Quality Rule Discovery and Specification:

Here, you'll likely need a mix of business analysts and test analysts to gather the data quality requirements and translate them into data quality rule definitions for assessment and monitoring purposes.

You may also draw on the support of data analysts, database administrators, and occasionally developers/coders, to ensure that you've fully mapped out the operational data environment and translated into a robust library of data quality testing rules.

Data Quality Rule Assessment and Monitoring

In this stage, you're relying on a mix of test analysts to execute the rules, assess the findings, and define the correct monitoring frequency for each type of business need.

This phase can feel more technical, yet we still often find business users getting involved, mainly because our data quality testing technology is easier to use and manage than the more traditional tech-heavy approaches.

Data Quality Defect Resolution and Improvement

Here, you would typically see your test analysts working with data analysts, business analysts, business users and technical teams to identify the root cause of issues to help identify the appropriate resolution plan.

Again, these are typical roles found in mid-to-large organisations with an established data and IT system landscape.

Data Quality Reporting and Planning

Finally, you would see your test analysts perhaps working with the business or technical communities to build an appropriate set of reports that will relay the right level of information at the right frequency.

At BiG EVAL we've tried to keep this process as painless as possible with plenty of pre-canned reports and monitoring statistics, which means you don't need business intelligence report designers and analytics specialists.

In terms of planning, as already mentioned, this would typically fall to more senior leaders within the organisation e.g. COO/CFO/CMO/CTO to work with the data quality management lead to identify a planning and roadmap for future data quality expansion.

What next?

Creating an operational data quality process may appear a complex undertaking, but it's quite straightforward and something we've observed many times with clients of our BiG EVAL Data Quality solution.

We recommend selecting an initial area of high value to the business and starting with a pilot initiative to demonstrate the immediate and long-lasting importance of protecting and monitoring the health of your critical data assets.

Book a discovery call today if you would be interested in undertaking an operational data quality pilot or simply learning more about our experience, technology, and approach in this area.

We'll talk in confidence about your situation and provide some practical advice to help you get started - book a call.

monitor-with-bigeval-dashboard