Sep 01, 2015

All Data is Not Created Equal

There is a staggering amount of data available today to decision makers. According to one estimate, 2.5 quintillion bytes of data are collected every day. As the world’s data continues its fast-paced growth, it is becoming increasingly important for businesses to have an infrastructure to manage and store it. However, experts say that one key point that is often overlooked is that all data is not created equal. Distinguishing between what is good data and what is not is key when managing data in a cost-effective and efficient manner.

An Example

Not all data is created equal. The quality and integrity of the data that is being gathered and used can have a significant impact on the business outcomes that are generated. According to the Data Warehousing Institute, “dirty data’ costs U.S. businesses more than $600 billion annually.  The impact of bad data is illustrated by looking at an example:

A transactional consumer packaged goods company has gathered data on its millions of existing customers and prospects across multiple geographies. However, if only 10 percent of that data is corrupted in some way, such as redundant or erroneous customer information including incorrect addresses or contact data, the company may be wasting millions of dollars creating targeted offers for people or offers for consumers who are unreachable.

This example highlights the potential costs of bad data to make operational or business decisions. A recent Gartner survey revealed that 140 companies surveyed lost an average of $8.2 million annually due to bad data, 30 companies estimated their losses at $20 million annually, and 6 companies surveyed estimated that they had lost more than $100 million annually due to bad data.

Bad Data vs. Good Data

Bad Data vs. Good Data

Volume is Not Enough

In today’s age of big data, user-friendly analytics, data visualization, and self-service reporting, HR organizations are still struggling to make good use of their data. According to a PricewaterhouseCooper (PwC) study, there is a significant gap between the data CEOs say they want from HR and what they get. The issue is not the lack of data, since the volume of data available to HR has increased dramatically in the past few decades. In addition to administration and compliance data, HR can now access talent management data, along with social network and real-time behavioral data, such as information on professional connections and e-mail habits.

Gap between the HR Data CEOs Want and What They Get

Gap between the HR Data CEOs Want and What They Get

Source: PwC

Simply having a huge volume of data is not enough to make strategic business decisions. To derive true value from data analysis, data analysts must ask themselves what data is critical to running the department and the business more efficiently.

HR Data Quality

To be useful to a company, HR data must be accessible, comprehensive, and valuable. HR data is no longer just about accurately paying workers. It is made up of multiple facets, and can be used to engage talent, cultivate leaders, and drive change. However, in order to do so, the data quality must be up to par, meaning that it is accurate, timely, complete, relevant, and consistent.

Data accuracy involves ensuring that the data is good data. This involves reviewing the reliability of where the data came from, what the process was to obtain the data, and how much the data reflects reality.

Timeliness looks at how old the data is in relation to when it will be used. An example of this is looking at data to analyze the turnaround time of processing new hire paperwork.  If a person is hired at the first of the month, but data is not inputted into the system for 30 days, and then that data is not collected for a report until 15 days later, the data is already 45 days old before it can be used. It is important to have the most current or real-time set of data. Additionally, looking at the timeliness of data can show efficiencies or lack of efficiencies in processes.

Data completion is often overlooked during data quality checks. Often, companies do not store all of their HR data in a single system. So to have completeness of data, every system, file, or database must contain the same population in their stored data in order to have a complete set of data for each person.

Data relevancy looks at the availability of required data elements. For example, if a company’s headcount is based on counts by building location, and that field is not required data, headcount becomes irrelevant, as it may not be complete for every person who should be included in the headcount.  Stored records should contain enough details to address business needs, both for today and the future.

Consistency refers to how codes or fields are interpreted by the person entering, supplying, collecting, or interpreting the data. For example, every department within the organization that uses a particular field should have the same meaning for it. Providing clear data definitions within the organization provides consistency and assures that data does not contradict itself.

Types of Data Prone to Quality Problems

Types of Data Prone to Quality Problems

Source: TDWI

Small Data

Often when people talk about data science, they tend to think about big data. However, small data is also extremely valuable. Small data is a dataset that contains very specific attributes, and is used to determine current conditions. It is important because it can trigger events based on what is happening now, and these events can be merged with behavioral information derived from big data.

For example, smart labels on medicine bottles use small data to determine where the medicine is located, its remaining shelf life, the seal status, and the current temperature conditions. Big data looks at this small data information over time to examine the root cause of why drugs are expiring or spoiling. Perhaps it is due to a certain shipping company or a certain retailer. Pinpointing these reoccurring patterns can lead to discovering problems in the supply chain that were otherwise hidden.

Using the analysis of these relatively small datasets can help companies optimize business processes to save millions of dollars. Small data informs analysts of what a tracked object is doing, while big data shows why the object is doing that.

β€œIt’s about getting the business leaders the data they need – not the data that they think they want. Partner with internal stakeholders at the outset to identify the key business questions and the data you will need to answer those questions. Start small to build confidence and credibility.” ~Gene Pease, CEO at Vestrics