Looking way back about two years ago, rarely was the term Big Data used in mainstream IT conversations like it is today. Maybe it was popular in data ninja circles at chic java internet cafe's. Unfortunately, I don't have the luxury of spending time in these cool places. And now, there are conferences titled "Channeling the Big Data Tsunami" - that I am drawn to. I can't get enough. Is this for real or is it just a trendy fad? Should I pull out my moon boots?
Don't get me wrong - I love riding the wave ("Hang 10, Mom" as my teenager says). I just used to call these mammoth databases VLDB's (very large databases) and big piles of web logs a headache. I guess I'm just not trendy or cool. I asked some of my fellow colleagues what they thought about Big Data and if they thought it was a fad. My definition of Big Data to them for this conversation was "Big Data relates to a data set that doesn't fit into a current analytical environment requiring things like compute clusters, software and algorithms to perform and complete within a reasonable time".
I heard two themes of answers.
Theme 1: Most of the data sets that we analyze or discover really only require a sample set to develop our model. If we need a bigger data set to improve our model accuracy, we go to The Man and ask for more storage for a bigger database or for a more powerful server either to transform and load data or to crunch complex analytics that would complete this week. Is it a fad? We'll see if more use cases drive it mainstream.
aka Old School.
Theme 2: This is not about using classical data analysis and statistics on a larger scale. If we could look at individual data points and find the answers we seek regardless of the size of the data set, we can provide a competitive advantage for our industry or launch science forward for the sake of mankind. Technology is the enabler - this is definitely not a fad.
aka Visionary.
The fact is Oracle, IBM and other database software providers have been in the Big Data business all along. This is not new - they just called it something different - a large transformation and load process to get new data into the current data warehouse platform, a VLDB. What is new is the introduction and advancement of technology and algorithms within the last two years from web companies (namely map/reduce running on Hadoop) that allow data analysts to glean value from data without the need to import it into a general purpose relational database. This is huge. This means I can realize value from this data immediately with commodity servers and maybe it means I don't have to buy more database licenses. Not to mention I can leverage open source algorithms that recent college graduates are proficient in (i.e. lower cost labor and free software).
I don't believe Big Data will replace traditional data analysis concepts that is used predominantly in today's data analysis applications. Most data sets fall under small to medium size ranges where current approaches are suffice.
However, I believe that as we start to see more applications that commercialize and prove that there is a monetary benefit to finite data analysis of these massive data sets, Big Data systems will become a standard component of the mature IT landscape. The consolidation of this market will accelerate the adoption by making the consumption and deployment of these systems less risky and more integrated. It will be up to the acquiring vendors in the recent feeding (acquisition) frenzy to find these applications and educate the masses on why this is not a fad. Otherwise, like my moon boots, this cool concept will take its place on a shelf.
Browse by Content Type
Categories
Share