In our experience, a businesses most profound need is to have their data organized, standardized, and consolidated with systems for data quality. These tasks are in vernacular, are referred to as data munging. The attributes of interest for data munging tools are
- productivity: how quickly can the task be completed
- maintainability: how robust will the design be to changes, and how easy will it be to adapt to changing data sources and systems
- scalability: how easily can the system handle larger data volumes, latency constraints, and so-on
- enterprise readiness: it is feasible to implement a data system which is automatic and fault tolerant
Each of the tools we service has a different combination of attributes which make them ideal for a different task. R-data.table, Python-Pandas, and SAS (or WPS) are very productive and powerful, which is an advantage for data science and free-form data exploration. Pentaho Data Integration and Alteryx trade power for maintainability and enterprise readiness. Pentaho Data Integration and Python have means for "virtually infinite" scaling.
We have deployed these tools in a variety of circumstances and will share our experiences with you.
Pentaho Data Integration