There. I said it.
It’s difficult to imagine the possibility when it comes to the buzz around the topic these days. But to be fair, I suppose it really depends on the questions being asked.
We’re hearing more and more from our customers about interest in Big Data and the associated buzzwords (namely Hadoop), but there continues to be confusion about what Big Data really is and what problems the associated solutions actually solve. Often, a customer simply wants to run a Hadoop, Kafka or Spark prototype because they want to better understand the technology, but they don’t consider the “why” or determine a relevant use case to warrant an exploration of the solution. I’m all for experimentation, but it should be within some sort of relevant context and it’s often misaligned with the expectations for what the technology is intended to address. My colleague, Jeremy Wortz, addresses a very similar theme in this article.
There’s no doubt that Big Data is a relevant and timely topic. SINTEF reported in 2013 that 90% of the world’s data was produced in the previous 2 years, and it’s fairly safe to assume that pattern is progressing exponentially. 500 million Tweets are produced each day on Twitter, 70 million photos are shared daily on Instagram, and applications within many organizations can produce more than tens of thousands of log entries each second. All of this data and more like it – put together – may reveal patterns that can inform decisions for many organizations. If you want a well-supported example of the power of data volume, read The Unreasonable Effectiveness of Data by Google Research (March 2009).
However, that doesn’t mean that every organization requires a “Big Data” solution now, and those that do should consider the use cases and problem attributes driving the need. To make things more confusing, the Big Data ecosystem is seemingly growing as fast as the data landscape. Hadoop started as a distributed file system (HDFS) and MapReduce architecture, but has grown to include multiple enterprise vendors, and multiple technologies are included in “Hadoop” platforms and uttered in the same breath (Spark, Flume, Pig, Hive, Drill, Impala, Kafka, and Storm to name just a few).
In this blog series, I’ll clarify a couple of high level “Big Data” problem scenarios and – in following posts – will outline some of the potential elements of solutions associated with their problem domain. The hope is that this series this will categorize some common Big Data problem scenarios and shed some light on the relevant solution architectures and technology. We’re excited about the promise of “Big Data” and even more excited about the power of the technology emerging in this area; however, as the landscape continues to evolve it’s becoming difficult to navigate, so it’s more important than ever to consider priorities, ask the right questions, and develop a Big Data foundation that is actually tuned to deliver meaningful incremental value as it evolves with your business.
 Åse Dragland, Big Data – for better or worse, SINTEF, http://www.sintef.no/home/corporate-news/big-data–for-better-or-worse/ (May 22, 2013)