So you have several Terabytes of data sitting around and someone asks if you are a “Big Data”
practitioner. Likely not. Well then what is Big Data and how can I get me some and leverage it to drive
my business forward?
Let’s start with a definition. Oxford Dictionary defines Big Data as “Extremely large
data sets that may be analysed computationally to reveal patterns, trends, and associations, especially
relating to human behavior and interactions.” O.K. Still a little too ambiguous for you? Me too.
The term Big Data underlies the central concern that we at MacLaurin Group have expressed. Data is a
single, important ingredient in creating a data driven culture. Let’s examine cake for a moment. You
need flour to make a cake (ignoring the comments from the gluten free contingent), but a cake is not
just flour. The full cake experience demands additional ingredients. Let’s see if we can create a usable
recipe for the ultimate objective, actionable analytic insight.
We will stick with the whole cake analogy. I only hope there are enough ingredients to support icing in this metaphorical confection. The data in Big Data is a very important ingredient and has some fundamental characteristics as defined by Doug Laney’s three V’s. These include variety, volume and velocity.
Let’s call this salt. Variety, as it turns out, is not only the spice of life, it is also important in the formulation of Big Data. Variety refers to the ever increasing sources, forms and varied context associated with Big Data. Most are familiar with the data that organizations create in the course of doing business. The majority of this data exists in Master Data Stores or Data Warehouses. This data is typically structured and heavily manicured through batch processing. If this is all of the data you possess, your data is likely just rotund. To get Big we need to carbo load. Lots of carbs in cake so Big Data should also include unstructured data. Unstructured data is acquired through sources such as video, audio, images, mobile, social media, the Internet of Things (IoT), web interactions, etc. These types of data exist in many various formats and do not lend themselves to the type of curation necessary for inclusion in a structured data environment.
This feels like sugar. This is a real differentiator. Big Data is not just big, it is also fast. Like
drinking out of the proverbial fire hose. Many of the sources mentioned in Variety above, have a near
real-time or real-time aspect. That is important. Real-time data enables a business to react to the
customer while they are engaged with their business.
If you want to be able to anticipate customer needs in that critical moment, then you need up to the
minute insight into how they are engaging and/or what they are purchasing as well as historical
information. Because this data is time sensitive, it does not lend itself to the manipulation necessary
to incorporate it into a structured data environment (data warehouse). If you did incorporate it into
the data warehouse, you would not be able to interact in real-time.
Baking powder. I’m struggling to keep this analogy afloat now. Dang, I should have gone nautical. Traditionally, companies were their own source of data. This has changed. The mountains of data that are being generated today are not just as a result of a customer’s engagement with a company. It comes from an organization’s various professional affiliations and is generated by the customers themselves. All that IoT and social media data takes up a lot of space. It also provides incredible insight into customer preferences.
We’ll call this milk. The whole reason Big Data is consumable, is that there are now products that can deal with really large quantities of unstructured data. Products like Apache Spark, Amazon EMR, and Azure HDInsight make processing these large data sets possible. Additionally, who asks if we should be cutting back on data storage now that it costs $0.001 per Gig! Nobody! Keep it all and let the analysts sort it out! The Big Data technology topic really requires its’ own analogy to do it justice, so we will save that for another day.
We’ll call this icing. We made it. It was touch-and-go there for a little while. Cake without icing is
little more than sweet bread. In Big Data parlance, this would be analytic rigor. You can have morbidly
obese data and spend buckets of cash on technology, but in the end, if you lack the analytic resources
and a data driven culture to support the creation of actionable insight, you have an upset executive
team with little to show for their investment. This is where the vast majority of companies seeking
analytic nirvana fall down. They focus exclusively on the “How” (3 Vs, and Technology) and dedicate
little or no energy to the “Why” (insights that affect outcomes). Don’t get me wrong, the “How” is an
important step but this cake is only half baked without the “Why”.
Let’s wrap this thing up before the cake analogy goes stale.
The question should not be “Do I have big data?” It should be, “Do I foster a culture that
values data as a means to an end?”.
The end in this case is analytic rigor that yields actionable insights that align with an organization’s
strategic objectives. If you lack a data driven culture and are currently throwing your flour, I mean
data, away, stop. While you may not be ready to preheat the oven, you can at least begin gathering the
ingredients.