With all the fuss about Big Data lately one could be forgiven for writing it all off as hype. All the major vendors now have “Big Data” products / solutions / and marketing, but yet nothing seems to have changed for most businesses – even those that have invested heavily in Big Data. Part of the issue has been understanding that what all this Big Data stuff really means and part of it has been how it is approached. Few organizations are realizing the benefit of Big Data and new tools because they are approaching them in old ways.
Nearly all established large vendors and providers have approached Big Data from a technology only standpoint with the most popular strategy being: “How do we make our existing products work with/on Hadoop?” The result has been a vendor solution approach of “How do we do the business intelligence projects we’ve done for decades with our new Hadoop-connected software?” Here is where everything sort of goes off the tracks.
The methodology behind successful OLAP / business intelligence solutions from the past has necessarily been a waterfall approach. This is because OLAP tools and platform require design decisions to be made upfront that will have a significant impact on your ability to report data out of these types of solutions. In effect this means a heavy upfront design and analysis phase followed by implementation phases and a testing phase. This has worked very well for most of the areas it is applied to such as demand forecasting, profitability tracing, and accounting. You could almost look at these as the hard sciences of information – they’re fundamentals and they’re not very likely to change once they are established (and rarely do they need to).
This same approach falls flat with Big Data which brings with it much more diverse information sources. A much different and more lightweight or exploratory approach is needed to take advantage of the proliferation of data in the modern organization. Much like agile methodologies changed software development over the last decade a similar approach is needed for Big Data. Unlike earnings or financial calculations which are likely to be readily understood and described, the value in the diverse sources that are beginning to be called Big Data are much more opaque. To make things more complex still, the value is likely to be vastly different depending upon whom the consumer of this information is – and when they consume it. Whereas sales and finance were probably the most likely consumers of OLAP reporting; marking, product development, and operations are more likely to find value in this expanded Big Data story.
The real impact of Hadoop on Big Data will not be its ability to do OLAP cheaper or with larger data sets, but its ability to be used for rapid development and data exploration – building truly advanced analytics. Instead of spending months designing a solution only to find out its value, the approach is simply to store the data first and go back later to project a solution onto the data already at rest. To an experienced OLAP team this may seem ludicrous or downright blasphemous, but it is a necessary part of the Big Data experience. The reason it can work is that the cost of storing this data is so much lower than OLAP solutions that you can afford to experiment and find value incrementally – or scrap a project completely.
The major vendors that have jumped on the Big Data bandwagon are unlikely to mention this and so are many solution providers who have spent years developing carefully crafted waterfall approaches. The fact is that just like in software development, agile methodologies will add immense value to advanced analytics and are really the key to success.