What SAS Doesn't Want You To Read: WPS 3.1, SAS, R and Medicare
It just had to be serendipity. Early one morning several weeks ago, I had two emails awaiting me -- one from the Wire headlining an article on the results of analyses of Medicare data that the government's made available to the public; and a second from analytics vendor WPL announcing version 3.1 of its statistical software WPS that “can run programs written in the language of SAS.” That email also directed me to update my demo license and install 3.1.
I spent parts of twenty years programming with SAS, finally abandoning the language for R twelve years ago in a snit over what I felt were excessive fees for partner demo licenses. That switch turns out to be the best statistical move I ever made.
What I loved about SAS in the early 80's, when it's statistical and data management/access capabilities were market-leading, I was less than enthralled with in 2002, when quality, open source alternatives like R and Python were available for free.
In the early 90's, before the ascent of Extract, Transform and Load (ETL) technologies, I sometimes used SAS to build relational “data marts”. Once Perl, Python and ETL platforms began to take hold several years later, however, I preferred using them instead to build the data stores. I suspect much of the expensive deployment of SAS today is for legacy data step programs that could readily be replaced with cheaper alternatives.
Indeed, R and Python are 180 degrees from the a-la-carte license plus annual subscription burden of SAS. And for me the integrated R language, with its vector and object orientation, is a big programming step up from SAS's now-dated data step + procs + macro metaphor.
Yet SAS remains the big gorilla of corporate statistical computing. Vanilla R is limited in the size of problems it can address by its in-memory architecture, and commercial R vendors haven't provided a comprehensive-enough response. Also, give SAS credit for a masterful job building/maintaining its franchise over the years.
Interesting will be what happens when SAS baby boomers retire. Many believe just as becoming lingua franca of academic statistical computing in the 80's fueled SAS's growth in the corporate world, R's current prevalence in academia is triggering a similar business tide. This is certainly true for data science start-ups, where open source software prevails. Statistical millennials prefer R/Python.
For the SAS customers who complain to me of the high cost of the software they cannot abandon, I recommend investigating SAS-clone WPS. One VC wag describes WPS as methadone for the SAS opiate. WPS supports most of the data step and macro programming language of Base SAS, also covering the basic stats procs and adding new ones each release. All of this for less than 1/3 the list cost of SAS. And it appears WPL has prevailed against pesty SAS copyright litigation. To my thinking, for customers with a heavy investment in SAS data step programming, consideration of WPS as a low-cost alternative is a no-brainer: it can both lower their software costs and give them leverage in their relationship with SAS.
The latest release of WPS comes with many enhancements, not the least of which is the new proc R, that allows SAS to inter-operate with R. SAS-ers might counter they've supported R for a long time. Yes, but an additional module beyond Base, proc IML, must first be licensed – and paid for.
When I downloaded and unzipped the Medicare file, I found the data, a dictionary and a script containing a SAS infile statement. After setting up Medicare program and data directory structures, I tweaked the SAS code and ran it in WPS, creating a 9M+ record, 27 attribute WPS-SAS data set that would source additional analyses. The load from the included code was bit lengthy, but the proc import I then tried completed in less than a minute and a half. Subsequent access has been very speedy.
From the “big” data set, I was able to do the types of things most SAS programmers do – proc contents, proc datasets. proc format, proc sql, proc freq, proc summary, proc tabulate, proc sort, proc import/export, etc. – as well as create subsetted and filtered work data sets. Just like the old days.
What I really wished to do, though, was put proc R though its paces, which would hopefully let me use the preferred R lattice and ggplot2 graphics packages in my WPS/SAS scripts. proc R delivered. I was able to both “export” a WPS data set to R, and then invoke lattice and other R code on the new data frame in a special submit-endsubmit block. Just for kicks I saved the filtered 9M record data frame as an rdata file. All operations except the latter were encouragingly fast.
That SAS is still the stats leader can be inferred from the included infile script with the Medicare data, as well as the availability of SAS data sets in addition to CSV files for other government data such as the American Community Survey. WPS handles the 15M record, 200+ attribute ACS SAS data set with aplomb as well.
I like the combination of WPS and R for statistical analysis. Loathe as I am to admit, there are many things SAS does better than R. With WPS 3.1, I can choose among the best of both platforms for statistical, data management and graphical functionality. And WPS provides that SAS functionality at significantly lower price points than SAS. Frustrated customers should take a look.
Contact us today to find out how Inquidia can show you how to collect, integrate and enrich your data. We do data. You can, too.
Sign up for our fascinating (albeit infrequent) emails. Get the latest news, tips, tricks and other cool info from Inquidia.