How is Data Science Different from Data Analytics?

Wednesday, September 3, 2014 - 16:30

It's common for firms to have a variety of data sources and systems in place, which in isolation, are serving them quite well. However, the advance of technology has made it possible to measure much more, and store a much richer representation of the history of your business -- which is placing new demands on the data systems, and leading data consumers to ask different sorts of questions.

This article discusses what we see as the distinguishing characteristics of between data analytics and data science. We hope this article will help you understand these needs more clearly, and align your systems and culture to server both practices well. Each of the disciplines has key differences in terms of data content, staffing, nature of business process, and tools used.


Data/Content

Data Analytics focuses its data on more operational and tactical data usage, whereas dataScience tends to focus more abstract data usage.

Data Analytics

Data Science

What do we know?

What could we know?

Intentional collection

Leverage any available "data exhaust"

Internal data

Leverage any accessible data from anywhere

Usually structured, and collection system is designed as such.

Data must be subjected to feature extraction before it will be useful.

Historical data required.

Historical data possibly not valuable or feasible.

Conclusions are of predictable value.

Conclusion may be surprising, and of hard-to-predict value.

Data volume generally modest, making storage and computation straightforward.

Data volume may be large enough to necessitate a concerted strategy.


Staffing

With Data Analtytics, the staff is more focused on structured data analysis and keeping contained to a limited set of technologies, while Data Science tends to require staff that are adept at multiple, sometimes disparate technologies.

Data Analytics

Data Science

Specialist who is familiar with the BI tools of choice, data warehouse, and reporting.

Generalist who is familiar with a wide array of tools, but also statistics, machine learning, domain knowledge, data warehousing, various database technologies, information retrieval, and systems architecture.

Uses a handful of often highly structured BI platforms, which will often suffice for the life of a project.

Uses many, more various tools, and the tool-set is likely to evolve with the project. Tools are more likely to be open source.

Staffing needs are high at the beginning of a project, and then predictable thereafter.

Less predictable staffing needs - as the task resembles research, ie., intuition guides experiments, some of which will endorse the null hypothesis.

Division of labor

Interdisciplinary group work


Business Process

Data Analytics is heavily focused on building structured data analytics processes to measure known metrics, while Data Science is more often geared around heavy discovery and iteration.

Data Analytics

Data Science

Governed data availability.

Begin with a “data lake” with poorly understood content.

Focus on executing business process.

Focus on discovering value.

Institutionalized data quality.

Post hoc work to establish data quality.

Quality established by process.

Quality established by inspection.

Statements and conclusions are deterministic

Statements and conclusions are probabilistic.

Consumed by finance and performance management.

Consumed by business and technology strategy, and product management.

Results must be perfect every time.

Data science process will discover and subsequently correct mistakes.

IT is the primary service provider.

IT provides data and tools access to data scientists.

Projects are typically longer and more structured.

Shorter, more agile projects which follow the hypothesize-test-validate or dismiss flow.

Projects involve a breadth of input from a variety of stakeholders, primarily by stating their needs.

Projects driven by those familiar with the data, looking for valuable relationships which they suspect may exist in the data.

“I need to know X.”

“I suspect X might be related to Y. Is that so?”

An answer is guaranteed to exist.

The data may endorse the null hypothesis, even if it has statistically robust information about a particular relationship -- and even that is not guaranteed.


Tools

Data Science typically uses more hands on code writing and irregular computing: part of the exploratory nature of the field. Data Analytics works to keep the toolset and computing more controlled, and limited to ensure ease of support and stability.

Data Analytics

Data Science

RDMBS usually meets the needs.

Potentially many diverse data stores. Sometimes, specialized databases such as a column store, scalable non-ACID document database, external/internal APIs, or big data tools.

ETL tool with maintainable process (eg., Pentaho/Informatica/Microstrategy)

Specialized, potentially one-off processes -- which if necessary, are productionized (eg., Py.Pandas, R.Pylr).

Analysis tool supporting OLAP, general business user self-service access tool (eg. ,Pentaho/Informatica/Microstrategy)

Py.SciPy, Py.Numpy, Py.Sci-kit learn, R -- tools allowing arbitrary operations on the data with sophisticated math, statistics, and learning libraries.

Visualization tools integrated with the BI platform (eg. ,Pentaho/Informatica/Microstrategy)

Py.matplotlib, R.ggplot2, D3.js -- allowing huge flexibility in the context and content of visualizations,

Reporting and dashboard tools integrated with the BI platform (eg. ,Pentaho/Informatica/Microstrategy)

IPython notebook, R Studio, custom apps used to create arbitrary hosted/sharable, potentially interactive dashboards and displays.

Computing needs are predictable.

Computing needs are irregular.

Algorithmically simple.

Often not algorithmically simple.


Data science as a practice will certainly change over time. As a conscientious technology manager, you will position your firm to meet needs of data analytics and data science moving forward. There are systems architectures, practices, and cultural fixtures which represent movement in that direction.

Our experience has been that needs vary greatly between firms, depending on the current state of their technology stack, organization of their data, the potential of their data exhaust, and their exposure to statistics and probabilistic decision-making. Each firm may have a different level of interest and maturity in each of these areas. Certainly, an organization that is truly data-driven attends to each of these areas, and does not dismiss one at the expense of the other.

Contact us today to find out how Inquidia can show you how to collect, integrate and enrich your data. We do data. You can, too.

Would you like to know more?

Sign up for our fascinating (albeit infrequent) emails. Get the latest news, tips, tricks and other cool info from Inquidia.