Stakeholders expect high data quality in Business Intelligence reports, and can quickly lose confidence if even a small amount of low quality data rears its head. So how do you know when you’re looking at high data quality?
A recent project of mine involved the health care industry, and more specifically, the finance department of a large health care organization. The project involved calculating profitability down to the level of individual medical tests given to individual patients, and as with any profitability reports, data quality was a huge concern.
Addressing Data Quality
One way to address and ensure data quality is to perform the following actions:
- Perform reconciliation.
- Flag and fix the major issues contributing to low data quality.
- Filter out low quality data from reports where anything less than high data quality is unacceptable.
- Create a report that shows where issues in data quality need to be addressed.
1.) First, perform a reconciliation. Reconciling a database should be required on every project involving integration of one or more systems. Reconciliation confirms that all source system data that should be in the system is in the system. (It’s also easier to test and find data anomalies once all the source system data is in the database.) Here’s an illustration of the type of output you should see from a reconciliation effort:
In the diagram above, the yellow highlighted box and bolded text in the “% of Missing Data” column not only shows data with potential data quality issues, but data that prevents the reporting user from seeing a complete set of data, which is necessary for the next activity; flagging and fixing low quality data.
2.) Flag and fix low quality data. Once all of the data is in the system and transformed, consider implementing logic to validate and detect data that doesn’t meet the defined quality bar. (A quality bar can be defined and documented during the business requirements sessions earlier in the project lifecycle.) Next, flag the records that contain low quality data. This will allow different users (developers, testers, & even IT support users) to go back and analyze this data at a later time.
Applying a generic flag to low data quality provides many benefits. It is much easier to isolate the root cause of an issue, because low quality data is not intermingled with the high quality data. It is also easier to fix the majority of the low quality data because resources can now spend time categorizing and quantifying the scenarios yielding the largest impact to low data quality. Flagging low quality data provides a layer of abstraction to reports for filtering.
3.) Filter low quality data. There are advantages to filtering data with low data quality. One advantage is that it separates the good quality data from the low quality data. This in turn raises the confidence of the business users that levels in data quality can be detected and segregated,and increased confidence allows the business user to trust the reports.
The key advantage is that the business users get access to the highest quality data, which can often provide enough information to approximate a gross margin and allow the business user to make a decision.
The project mentioned above is an excellent example of filtering data. This project required many complex data scenarios during data extraction and data transformations. Filtering data quality was the critical success factor in closing the project with high data quality. Even though some people might say filtering data quality is avoiding the problem, I would argue it helps ensure that the company in question has enough high data quality to make an informed decision.
4.) Raise Visibility to Low Data Quality
One of the most important points to remember is that finding the low quality data is often not enough; that data may need to be fixed as well. Once you’ve flagged the low quality data, you have enough tangible information to provide to those who’ll decide if it needs to be repaired.
Low data quality can be quantified. It can also be used to drive changes to upstream system owners. This quantified data is visible evidence that there is a problem, and this evidence can be used to determine the appropriate next steps.
Thanks for reading! Leave your questions and/or comments below, or contact us for more information on best business intelligence practices.