In the previous installments of this series on the rise of Enterprise Analytics, we discussed how prioritizing actual analytical use cases over perceived requirements will accelerate the enterprise’s learning curve through rapid data discovery in a standardized, yet dynamic data ecosystem. Rapid data discovery is one of the keys to success of Enterprise Analytics, but observations won’t have a lasting impact without sound statistical confidence.
It’s clear that rigorous data science, supported by faster computing and sophisticated algorithms, will transform the insights from rapid data discovery focused on short term needs to more strategic, organization-impacting analysis. Indeed, Gartner predicts that “the future of business will be defined by how well companies, organizations and governments use technology to engage with partners and customers across a wide range of digitalized processes(1).” This engagement will be underpinned by communication of well- founded statistical evidence.
The Pace of Power
The average CPU performance today is nearly five times that of the CPU performance of ten years ago(2). This ever-increasing computing power has made the raw materials for sophisticated models and techniques accessible to the masses. The information hidden in the “connected everything” and IoT can be unlocked with algorithms made possible by the sheer strength of our machines.
CPU Benchmarks by Release Date
Gartner agrees that “the rise of advanced analytics, the Internet of Things (IoT), and mainstream adoption of big data technologies and initiatives create both the raw material and the intelligence that algorithms can exploit(3).” This computing capacity is priming the skills of the data scientist to become an increasingly integral hub of the Enterprise Analytics ecosystem.
Increasing Computing Power Permits New Techniques
Gartner predicts that globally over the next two years, more than half of large organizations will compete using proprietary algorithms(4). The power, volume, skills, and requirements are aligning for the logical necessity of data science as a pillar of Enterprise Analytics.
Latency previously made statistics and predictive analytics less valuable to enterprise decision-makers. Having to wait hours for a simple prediction to run is not the way the data-driven enterprise wants to operate. Modern computing power coupled with a healthy data ecosystem changes the data scientist’s top priority from how to make their test run more efficiently to what iterations on to next employ. Advanced computing power allows data scientists to exercise the scientific method more effectively.
Faster computing means analysts can use more sophisticated and computationally intensive models such as random forests, support vector machine, neural nets, and deep learning neural nets. It also allows them to leverage resource intensive techniques, including more input parameters and variables for selection, larger datasets for model training, and more thorough cross validation and model selection and permutations. What was out of the realm of possibility until recently is now readily at the fingertips of most enterprises and individuals via a laptop, a server, or the cloud.
Computing Power to the People and the Emergence of the Citizen Data Scientists
In our previous post, we discussed the rise of the citizen data scientist in the Enterprise Analytics ecosystem. While citizen data scientists will excel in pinpointing abnormalities, making hypotheses about trends, and blending data sources to tell compelling stories, their real strength will be in providing these insights to others in the analytics organization.
The data scientist is poised to leverage advanced computing power to explore citizen-generated hypotheses and initial stories with rigorous statistical precision. It will be the data science team, armed with incredible computing power and statistical know-how, that will forecast trends or determine causality for enterprise leaders to make a decision. The data scientist will develop methods to confirm, predict, diagnose, and underscore the citizen’s findings with statistical confidence. If the citizen data scientist is the lightning, the data scientist will be the thunder.
We see this model being primed in our own engagements. In one case, the client had a known problem: a particular field was frequently being defaulted to an incorrect value. This was accepted by members at all levels of the enterprise as an annoying, albeit uncontrollable, data quality nuisance. However, a few analysts found other common anomalies with those records. This preliminary observation encouraged the decision makers in the organization that something could, in fact, be done to improve the data quality.
Our data science experts were able to use the predictors that the citizen data scientists had discovered, as well as a few other key factors, to successfully discern when that field had been incorrectly populated. This insight was invaluable for tempering previous analyses and could be used in future analyses as a double-check against what was coming from source systems into the enterprise data warehouse.
The Unique Role of the Data Science Team
Some analytic functions, such as data exploration, data quality verification, data visualization, and reporting are scientifically less intensive, and can be carried out by citizen data scientists(5). However, it is the primary role of the data scientist to investigate with the tools of statistical significance.
Successful data science in the enterprise will require a well-crafted team of individuals with different strengths and experience. Anyone who has worked with analytical techniques knows there are two main arms of data science: predictive and explanatory(6). Data scientists will be expected to improve predictive power of source data, perform exploratory model development with randomized variables for benchmarking, execute model iterations, and use combinations of N-fold cross-validation and bootstrapping to confirm that the model is sufficiently general and stable(7). Their skill and mathematical insight will provide support for decisions to share with executives and investors that has the confidence of years of algorithmic history.
Data Scientists are Driving the Enterprise
Phenomenal computing power, analytically-relevant centralized data stores, and volume, variety, and velocity of data challenges the modern enterprise to step into what Gartner calls the “new economics of connections - the creation of value from the increased density of interactions between ‘things,’ people and businesses in interrelated social, information and computer networks(8).“
Enterprise Analytics is structured around an analytics-driven accessible data platform that is strengthened by IT’s centralization strategies. We have argued that this is to be accomplished by working together to solve the biggest analytics issues instead of perceived requirements. Maintaining that business-led focus is also important for the data science arm of the enterprise. As questions are raised by business users and other citizen data scientists, the data science team should prioritize these insights in order to help the enterprise grow exponentially. Perceived needs and wants may not be what is most needed to advance the ecosystem and data-driven culture.
Communication about priorities is crucial, as it is only with the strengths of a well-trained data scientist that the enterprise will have the confidence to take advantage of the intuitive user experiences providing citizens across the enterprise with inroads to data. The next step in this ecosystem will be communication and a social realization of the fruits of these efforts. A symbiotic, data sharing economy.
This post is the third in a series exploring the five trends that Inquidia sees in the business intelligence marketplace.
Data scientists are empowered by faster and stronger algorithms and techniques to test hypotheses.
The age of search and the intelligent web conditions users to expect instant access to data for exploration and development of hypotheses.
Enterprises will use storytelling, data drilling, and peer-powered collaboration to generate hypotheses and make decisions.
(1) Gartner, https://www.gartner.com/doc/3180618?plc=ddp
(2) http://www.spec.org/cpu2006/results/ CPUINT and CPUFP Benchmarks, 2163 chips tested through March 2016 (Inquidia Road to Self Service Analytics presentation, March 2016)
(3) Gartner, https://www.gartner.com/doc/3180618?plc=ddp
(4) Gartner, https://www.gartner.com/doc/3180618?plc=ddp
(7) Inquidia Road to Self Service Analytics presentation, March 2016
(8) Gartner, https://www.gartner.com/doc/3180618?plc=ddp