Definition: What Is Data Quality?
Data quality indicates how reliable a given dataset is. The data’s quality will affect the user’s ability to make accurate decisions regarding the subject of their study. For example, if the data is collected from incongruous sources at varying times, it may not actually function as a good indicator for planning and decision-making.
High-quality data is collected and analyzed using a strict set of guidelines that ensure consistency and accuracy. Meanwhile, lower-quality data often does not track all of the affecting variables or has a high-degree of error.
“Half the money I spend on advertising is wasted; the trouble is I don't know which half,” said US merchant John Wanamaker who lived between 1832 to 1922. In the years since, this problem with insight into advertising spend and campaign effectiveness has not disappeared. Data quality is important to track and measure, especially in marketing analytics, because it allows organizations to make informed decisions regarding their campaigns and budgets quickly. If organizations make decisions off of inaccurate, incomplete, or otherwise skewed data they run the risk of executing strategies or policies that are not reflective of their consumer’s preferences. At best, they waste their advertising spend and at worst, they can damage their relationships with customers.
Data Quality Standards and Criteria
To ensure high-quality data is collected, participants should agree on data quality standards before embarking on a project, – this will create uniformity throughout the analysis.
There are six common dimensions of data quality standards. Other standards may vary from project to project, but will often consist of the following criteria:
- Completeness / Comprehensiveness: Ask what essential fields have to be filled in for a dataset to be considered complete. For example, Name and address may be crucial to the completeness of the data, while a customer’s gender is less essential.
- Consistency: All iterations of a piece of data should be the same. Take a given month’s web traffic for example - in every report, platform, or spreadsheet, is the number of website visits in that month the same? Or, are there inconsistencies across these data? A lack of consistency in these points could lead to confusion down the road.
- Accuracy: While consistency is about having the same value across all channels, accuracy is about ensuring those consistent values are correct and closely reflecting the reality of the results.
- Format: To avoid inaccuracy or confusion, make sure data entry formats are consistent. For example, you do not want the year to be entered in some locations as ’19 and in other locations at 2019.
- Timeframe: Timeliness of data refers to whether decisions marketers have data insights at the optimal time, and ho current the data is. Do you have the data when you need it, and are you referencing the most up to date version of the dataset?
- Validity / Integrity: This criteria looks as whether a dataset follows the rules and standards set. Are there any values missing that can harm the efficacy of the data or keep analysts from discerning important relationships or patterns?
The Benefits of Quality Data
The main reason that organizations invest in maintaining high-quality datasets is to make informed decisions that will offer returns for the business. For example, if the data shows that your customers stay out shopping later on Saturday night than Tuesday night, you may choose to extend your hours on Saturdays, consequently winning more business.
High-quality data facilitates strong decisions making in the following ways:
- Make good business decisions fast: In today’s consumer-centric market, organizations have to evaluate data to understand consumer desires as they develop, and shift their strategies accordingly. With this in mind, accurate and current data is essential. When organizations are using high quality data, they can also be more confident that they are making the right decisions in this ever evolving market.
- Work across teams: When different departments have access to consistent data, it’s easier for companies to stay aligned on priorities, messaging, and branding – yielding more strategic, cohesive results.
- Get a holistic view of the customer: Customer data provides insights on your clients’ interests and needs. This allows organizations to build better relationships while creating products and campaigns that are informed by specific consumer needs and desires.
The Problem with Bad Data
Data quality is a challenge for many companies - and the problem is often worse than organizations realize. Wanting to work quickly to collect data and use it to optimize programs in near real-time, organizations may skip data quality assurance practices, such as establishing standards and criteria. This can easily lead to reliance on inaccurate, incomplete, or redundant data, creating a domino effect of decisions based on inaccurate numbers and metrics.
Additionally, because organizations are now working with massive sets of big data, many do not have the data science resources available in-house to sort and correlate this information. Without the proper tools and analysts to sort this data, organizations will miss out on time-sensitive optimizations.
According to one study, only 3 percent of executives surveyed had data records that fell within the acceptable range. Moreover, 65 percent of marketers are concerned with the quality of their data while 6 out of 10 marketers are listing improving data quality as a top priority. Consider the following implications of bad data:
- High Costs: According to IBM, bad data quality was costing organizations $3.1 trillion in 2016. In fact, nearly 50 percent of newly acquired data have errors that could negatively impact the organization. Additionally, according to MIT, bad data can cost organizations as much as 25 percent of total revenue.
- Wrong Decisions: Basing business decisions on faulty or incomplete data can result in your team overlooking a critical piece of information. Consider this: The brand awareness your out-of-home ads are generating could be responsible for the majority of conversions. However, if your company uses an incomplete attribution model, you may allocate funds to the wrong media vehicles, instead of the media that is driving the most results. This would ultimately lead to reduced ROI.
- Strained Customer Relationships: Bad data doesn’t just impact your advertising budgets - it can impact your customer relationships as well. If bad data leads you to target a customer with products and messaging that do not align with their interests and preferences, it can quickly sour them to the brand. This may cause them to opt out or disregard future messaging.
How Do You Assess Data Quality?
Given the consequences of bad data, companies need to understand how to evaluate data so it best suits their needs. This includes establishing metrics and processes to assess data quality. According to an article on Data Assessment from Pipiano, Lee, and Wang, companies must strive for their data to score high in both objective assessments and subjective assessments.
In order to improve data quality organizations must complete the following:
- Evaluate objective and subjective data quality metrics
- Analyze the results and determine the reason behind any incongruities
- Determine next steps for improvement
Subjective assessments measure how stakeholders, analysts, collectors, etc, perceive the quality of the data. If a stakeholder is tasked with making a business decision based on a dataset they feel may be incomplete or inaccurate, this perception will ultimately affect the decision they make.
Objective Data Quality Assessments
Objective data quality looks at objective measurements recorded in the dataset, which can be evaluated in the context of the given task, or independently from a purely metrics-based perspective. To establish metrics by which to assess objective data, organizations can use principles to develop KPIs that match their needs, known as functional forms. When performing objective assessments, there are three ways to measure the different functional forms in terms of quality. These include:
- Simple ratio
This measures the total number of desired outcomes to the total possible outcomes. The range of this ratio is usually between 0 and 1, with 1 being the most desirable result.
Completeness and consistency can be measured through this ratio. However, both of these dimensions can be measured in different ways - so organizations need to determine criteria to best measure this.
- Min or Max
This functional form is designed to handle multiple data quality variables.
The min is designed to be a more conservative number, while the max is a more liberal number. Variables such as appropriate level of data can be represented by min. Timeliness and accessibility can be represented by max.
- Weighted average
This is an alternative to min and can be used when organizations comprehend the value each variable delivers into the equation.
After evaluating objective and subjective data quality metrics, organizations must take the next steps to improve their processes. Companies may find they are lacking data completeness or data quality. Below we will outline some best practices for overcoming data quality challenges.
Overcoming the Challenges with Data Quality and Clean Data
What is Data Cleansing?
Working with bad data comes with consequences ranging from extra cost to added time. To avoid these negative outcomes, many organizations will undertake data cleansing projects. Data cleansing is the review and correction of records or databases to rid them of redundancies and inaccurate, incomplete, or otherwise misleading information that can skew results and cause erroneous or impractical decisions. Ultimately, the goal of data cleansing is to improve overall data quality before making business decisions. In some situations with small data sets, it may be possible to clean data manually, while for large datasets a data cleansing tool or platform may be needed.
In addition to cleansing data with manual processes or with data cleansing tools, organizations can overcome the challenges associated with bad data by establishing defined policies and responsibilities during the data collection process. This way, team members will have a clear understanding of what is expected of them and which standards their data entries must meet.
For access to clean high-quality data, make sure to implement the following best practices:
- Establish Clear Responsibilities: Create positions such as Data Manager or Data Governance Manager. These roles will be responsible for creating data collection and cleansing policies to ensure quality. They will also disseminate data collection and use best practices across various teams and departments to ensure optimal results. Review the Data Governance section of this page to see what these roles look like.
- Establish a Clear Process - Do your sales and marketing teams have a clear process for how they handle leads? Do your customer service teams have a clear way to mark that users had questions about a product feature? Good data quality is an organization wide effort. Having a clear process in place will make it easier to keep clean records in terms of what data is collected and how that data is formatted to make information easily accessible.
- Combining Different Data Sets through Technology and Data Scientists: One of the biggest challenges for organizations is combining correlating datasets that are different in format or measure different KPIs. This can include unstructured and structured data, or how to most effectively measure online and offline campaigns. Data management leaders can work with technology partners to combine large datasets in a way that makes them comparable. In fast-paced business landscapes, seek a partner with the processing power to deliver these datasets in near real time. From there, organizations will need to employ data scientists, or leverage a third-party service, who will play a role in interpreting these results and turning them into actionable next steps for your team.
What is Data Quality Management?
Data quality management refers to the implementation of policies and technology that enforce data quality standards. Effective data quality management allows organizations to make informed decisions quickly, reduce costs, and maintain compliance with data governance regulations.
Data Quality Management Stakeholders
Strong data quality management programs will require involvement from the following roles:
- Data Owner / Governance Team: The data governance team establishes the processes and protocols that must be enforced to ensure high-quality data. These team members will also be responsible for selecting data management and analytics platforms when necessary.
- IT: IT buy-in and support is essential to ensure proper configuration and uptime of data management and analysis tools and solutions.
- Data Stewards: Data Stewards are employees across the departments that collect, analyze, and make data-informed decisions on a daily basis. Data stewards evaluate data input to ensure they are meeting the quality standards and policies assigned by the data governance team.
In addition to positions specific to data management, effective data management requires buy-in from business executives, individual departments and the data collectors for those departments, analyst teams, and legal and security teams where data regulations are concerned.
Data Quality Management Process
Data quality management will generally be composed of the following stages:
- Data Profiling: At this phase of the data management process, data governance team members evaluate which data assets will be involved in the project, and the maturity of their current program. From there, team members can determine what steps have to be taken to reach their data quality goals.
- Data Reporting: Data reporting is the process of eliminating or reporting all incongruent data.Once the data assets and program goals have been defined, data management teams can begin to enhance datasets with data reporting.
- Data Repair / Enrichment: Data repair and enrichment is the process of improving the dataset by ensuring quality standards are being adhered to, and adding additional context where necessary.
What Data Governance Roles Does My Organization Need?
While data management is the implementation of processes, tools and policies, data governance refers to the overarching team that creates and enforces them. These teams will typically be broken down across business levels including: strategic, tactical, and operational.
- Data Manager
- This is typically a senior role in the company, working at the strategic level of governance. The Data manager is responsible for one set of data and maintaining its integrity. They are usually assisted in this task by data stewards.
- Data Stewards:
- Data stewards operate at both the tactical and operational levels of governance and are responsible for the daily maintenance of data integrity. They may construct the plan for maintaining data quality, which is approved by the data manager.
- Data Producers
- Data producers work in operational tier of data governance. These team members create, update, delete, etc., data entries. Most employees create or are responsible for data in some form. It’s important for them to understand how to handle data. For example: If your sales team is using a CRM system, it’s important for them to keep their accounts up to date. Leaders on the governance team must educate these employees on data collection and use standards.
- Data Users
- Data users are those that leverage data to make decisions. If they see errors or need additional pieces of data, it is their job to alert those responsible for the dataset.
With data quality standards being adhered to and enforced across these roles, organizations can maintain high-quality data repositories that will yield optimal outcomes.