R, the Integration Language?

Tuesday, January 21, 2014 - 09:30

My company, Inquidia, is a consulting partner of hot visualization vendor Tableau. We’re quite excited about the latest product release for many reasons, perhaps the most significant being the long-awaited integration with the R statistical platform. In the Fall, I had the opportunity to participate in the 8.1 Beta Program that provided access to Tableau product management/support for my often-dumb questions about the new functionality.

The Tableau-R integration allows developers to create workbook variables using the R language and statistical procedures. An added benefit is that the R computations generally behave as the analyst “wants” when filtering or performing slice and dice. I’ve successfully used simple R code to drive my split-apply-combine stock portfolio return normalization and percent change calculations. The uninitiated can learn a lot from the highly-informative R and Tableau: Data Science at the Speed of Thought webinar by Bora Beran.

Integration with R now appears to be a sine qua non strategy for analytics tool vendors. I’m currently investigating KNIME, an open source “user-friendly graphical workbench for the cradle-to-grave analysis process: data access, data transformation, initial investigation, powerful predictive analytics, visualisation and reporting.” KNIME is architected as a visual workflow metaphor and has much the look of a data integration tool, with drag-and-drop node folders such as IO, Database, Data Manipulation, Mining, Reporting, Statistics, etc. An R node is easily added.

New to the software, I find myself turning to R for many of the IO, DM and Statistics tasks I’m sure could be accomplished with core KNIME functionality. With the R plug-in, KNIME can powerfully combine text files, KNIME tables and R coding in its workflow. For my KNIME stock portfolio illustration, I use the R data.table to execute “split/apply/combine” logic and create a second, “pivoted”, data set available for subsequent workflow steps. Alas, I’m reluctant to “share” my flows, fearing the cognoscente would question my sanity for doing in R what could be done directly in KNIME.

Two years ago, I wrote on the integration of R with visualization tool Omniscope by Visokio. The Omniscope architecture consists of two major components: “DataExplorer, which provides data discovery & analysis, reporting and dashboarding, and DataManager, which offers tools to build and manage data sets. DataManager is essentially a poor man’s ETL tool, providing a drag and drop visual workflow to drive data extraction, merge, transformation and delivery on a small scale.“

R plugs in seamlessly to DataManager, allowing the analyst to access its full complement of language and statistical features. “As an illustration, I load a comma-delimited file and then link the contents to an R script that pivots the text data using functions from the R reshape package. After a few more R programming statements, the “munged” data is returned to DataExplorer for discovery. I’ve tried similarly reshaping/filtering/enhancing input data from several other NBER files, all with positive results…… My next test was to link several R Operations tasks in succession: the first to read an already-input stacked time series data file and create new variables; the second to reshape and restrict the resulting stream; and the third to invoke the Holt Winters forecasting function to “predict” the next 30 days of measurements for each of the selected series. This, too, worked well. “

And 3.5 years ago, I lauded the integration of R with the Spotfire visualization platform for many of same reasons. “The ability to execute R functions in the background and seamlessly move data between R and Spotfire is key. In fact, one can do all data manipulation for Spotfire in R scripts, then simply push the results into Spotfire with the interface. As the models got more complicated with additional factors, I combined the predictions with the base data using Spotfire's trellis graphs, the powerful dimensional visualization metaphor pioneered in S/R. At the end of the week I was convinced: I had my statistical cake and was eating my interactive visualization too.”

I’m not the typical Tableau, KNIME, Omniscope or Spotfire developer. I’m pretty lazy: Rather than spend time assimilating a new visual product language syntax that won’t scale for me, I’d just as soon code in something I know and can use in multiple instances. Let others do the dirty work.

Ironically, where R is integrated with other analytics tools, I use it as much for its base language as I do for statistical procedures. Be it for munging, wrangling, reshaping or other data science tasks, R offers a powerful and expressive vector syntax that can be applied in many contexts. Difficult to learn and quirky, yes – but a welcome extension to emerging analytics platforms.

 

Originally published in Information Management.

Contact us today to find out how Inquidia can show you how to collect, integrate and enrich your data. We do data. You can, too.

Would you like to know more?

Sign up for our fascinating (albeit infrequent) emails. Get the latest news, tips, tricks and other cool info from Inquidia.