The world of predictive analytics is one with many points of view, tools, and processes that makes it difficult to define by the day. With the adoption of data science, many organizations have accelerated their investment to convert their vast stores of data into actionable predictions. Enterprise organizations are adapting to integrate predictive analytics to better become a data-driven organization. This blog is a guide to the current state of analytics and serves as an introduction for more detailed blogs centered on the topics presented.
As we consider today’s state of technology, it’s important to recognize the variances which exist between the different types of technology – those products which can be pricey, but are robust solutions, the products which are entirely free but have particular drawbacks, as well as some of the newer, emerging technology. Take a look at the graphic below to better understand how some key analytics providers compare across multiple factors: cost, ease of use, talent pool depth, big data readiness and flexibility.
SAS and SPSS: SAS has grown out of a statistical programming language in the ‘60s, and has evolved into the realm of enterprise solutions ranging from BI, DI, to specialized functions in quality and supply chain. SAS’s strength lies in its ease of use on its more GUI-based tools like Enterprise Miner. The base language is procedure based but not object-oriented – a major drawback as the most effective tools are object-oriented. It is difficult to find common conventions that span the entire language. However, SAS is deeply institutionalized in many academic settings and enterprises, and many analysts still stay focused on only producing analytics in just one language.
Over the last 5-10 years, there has been a considerable shift to R and Python from paid technology.
R: R is completely open source, and comes with fantastic IDEs (rStudio), server support, and thousands of packages that can handle any analytics problem thrown at it. R also has fantastic visualization capabilities with shiny, and also has a nice GUI-driven data mining tool RATTLE.Lack of support does tend to be a drawback with open source solutions. It is now becoming more common to find support as managed services in many consulting firm, and even traditional software vendors, such as Oracle and SAP.
Python: Python is beginning to rival R as its capabilities have evolved in recent years. Python has traditionally been known as a standard development language, but recent investments into statistical and machine learning packages has accelerated adoption of Python in the data science communities. Python can now even call R code, and is integrated with most big data platforms. Python rivals R and other paid solutions in many institutions (not to mention its extensive developer base outside of the machine learning space). PANDAS (efficient data structures), NumPy, SciPy, and SciKit-Learn are all excellent packages that make Python data science ready. It can also handle unstructured data with the stand-by package for natural language processing: NLTK . There is even a graphical data mining tool, Orange, that works off of Python.
AzureML: AzureML is one of the many new software-as-a-service solutions for advanced analytics and machine learning. Many of it is new and experimental (it was released in 2013), but is shaping up to be a very interesting tool that leverages a variety of machine learning approaches, unstructured data processing, and compiles a predictive scoring job into a web service API call. The last piece is very interesting, as it takes the pain out of integrating score code on foreign platforms, and makes the application of predictive as a whole very flexible and potentially real-time.
Despite all of the complexity, there is one philosophy that ties all of these topics together: using data to support business decisions. Centering on the philosophy of how data can be leveraged to solve problems should be the critical focus as opposed to obsessing over technology, tools and techniques. Stay tuned for more detailed blogs on the topics we barely scratched the surface on.