In statistics, a hypothesis is proposed and then data samples are collected to prove or disprove the hypothesis with acceptable confidence levels. For example, let’s say that all our customers are aware of all our product lines. Basically, there are two ways of assessing our hypothesis that includes: (1) Proving our hypothesis and (2) Disproving our hypothesis.
The first way to proving our hypothesis is that we communicate with all of our customers and inquire if they know all our product lines. The second way is to communicate with as many customers as possible until we come across any customer that does not know all our product lines. From this example, we can see that if we find even one customer then that disproves our hypothesis. Thus, this is the reason why in statistics, sometimes it is easier to find an exception to disproving a hypothesis rather than proving it.
Big Data, on the other hand, inverts the generally acceptable process from hypothesis then data sample collection to data collection then a hypothesis. What this means is that Big Data emphasizes collecting data first and then coming up with a hypothesis based on patterns found in the data. Generally speaking, when we talk about Big Data, we are concerned with the 3 Vs that include:
- Volume – Amount of data
- Velocity – Rate of data analysis
- Variety – Different data sources
Some have indicated that we need to go beyond just the above three Vs and should also include:
- Viscosity – Resistance to the flow of data
- Variability – Changes in the flow changes of data
- Veracity – Outlier data
- Volatility – Validity of the data
- Virality – Speed at which data is shared
I would take the Big Data concept a bit further and introduce:
- Vitality – General and specific importance of the data itself
- Versatility – Applicability of data to various situations
- Vocality – Supporters of data-driven approaches
- Veto – The ultimate authority to accept or reject Big Data conclusions
For a metrics-driven organization, a possible way to determine the effectiveness of your Big Data initiatives is to do a weighted rating of the Vs based on your organizational priorities. These organizational priorities can range from but not limited to increasing employee retention rates, improving customer experiences, improving mergers and acquisitions activities, making better investment decisions, effectively managing the organization, increasing market share, improving citizens services, faster software development, improving the design, becoming more innovative and improving lives. What all of this means is that data is not just data but it is, in fact, an organization’s most important asset after its people. Since data is now a competitive asset, let’s explore some of the ways we can use it:
- Monte Carlo Simulations – Determine a range of scenarios of outcomes and their probabilities.
- Analysis of Variance (ANOVA) – Determine if our results change when we change the data
- Regression – Determine if data is related and can be used for forecasting
- Seasonality – Determine if data shows the same thing occurring at the same intervals
- Optimization – Getting the best possible answer from the data
- Satisficing – Getting a good enough answer from the data
Now that we understand what is Big Data and how it can be used, let’s ask the following questions:
|Who is capturing data?||Who should be capturing data?|
|What is the lifecycle of your data?||What should be the lifecycle of your data?|
|Where is data being captured?||Where should data be captured?|
|When is data available for analysis?||When should data be available for analysis?|
|Why data is being analyzed?||Why data should be analyzed?|
Having discussed the positives of Big Data, we have to realize that it is not a panacea and has its negatives as well. Some of the negative ways data can lead to bad decisions include: (1) Data is correlated but that does not imply cause and effect, (2) Data shows you pretty pictures but that does not imply it is telling you the truth and (3) Biases can affect data anywhere from capturing to analysis to decision-making.
In conclusion, what this means is that the non-distorted quality, understanding, and usage of data is the difference between just getting on the Big Data bandwagon or truly understanding how data can fundamentally change your organization.
- Realizing the Promise of Big Data
- Beyond the three Vs of Big Data
- 5 Factors for Business Transformation
- 5 Questions to Ask About Your Business Processes
- 5 Questions to Ask About Your Information
- 5 Questions to Ask About Customer Experiences
- 5 Observations on Being Innovative (at an organizational level)
- Where is my Big Data coming from and who can handle it