Containing Big Data: What Size is Your Suitcase?

Thursday, January 12, 2017 - 10:15
My wife and I decided to exchange suitcases as primary gifts this past Christmas, at last replacing the raggedy ones we've suffered with for over 12 months. For me, a 21 inch spinner carry-on seemed best, while my wife initially opted for the considerably larger 25 inch model. When I purchased and brought the suitcases home December 23, however, she had a change of heart, noting that the 25 inch was larger than she needed for most trips she takes. So I returned the 25 for a 23 inch and we both were happy holiday travelers.

While the new suitcases will serve our needs for most  travel plans, we know we'll have to make accommodations for longer trips, such as a planned two week European junket in the spring. The thinking is that we'll either purchase or borrow additional luggage for the longer trips rather than “solve” the suitcase problem once and for all with bags larger than we generally need. In the end, we decided to optimize for 90% of travel at the expense of the extreme 10%.

The suitcase purchase metaphor extrapolated to several December work-related situations. The  first was my annual get together with a stats grad school friend. Every year it seems, we “discuss” the relative merits of his statistical software choice, SAS, versus mine, either R or Python. And every year his opening salvo is that my open source solutions are limited in the size of data sets they can process by the available memory on my machine, while his choice exploits virtual memory to theoretically deal with much larger data.

I counter that my 64 GB notebook readily accommodates 90% of my statistical work, recently handling a 50M+ record, 70 attribute data set in R with aplomb. Moreover I argue that I have in my pocket postgres  and MonetDB databases to serve R/Python  for larger than memory data. Finally, I note that I've readily deployed Spark in the cloud for terabytes of data – and by the way my options are much cheaper than his. Incidentally, what's the largest data set he statistically analyzed in 2016 ? A measly 2 GB!. Alas, I don't think my friend was persuaded by my 90% logic.

The second illustration comes from a customer with whom my company, Inquidia Consulting, just completed a brief planning engagement. In business for just three years, this company's products are data and analytics. They turned to Inquidia to help develop a  product roadmap for the next 5-7 years.

I'm convinced that some companies not so secretly wish to be big data, even when they're not. The consulting customer CEO touted a willingness to build a beefy hadoop cluster infrastructure early on, but while Inquidia likes nothing better than implementing hadoop, our requirements-gathering findings  generously estimated no more than 10 TB of enterprise data after five years. Not exactly big data – and certainly not a requirement warranting an immediate solution driven by 10% planning.

The good news is that a combination of on-premise and cloud-based infrastructures can be used to manage the risks of a 90-10 solution. The customer's once daily computation needs would be very demanding, but easily satisfied by a use-when-needed cloud-based cluster.

To address the needs of modest data with regular but short bursts of intense computation,  Inquidia proposed an architecture that leveraged on-premise open source database storage and cloud-based computation immediately, evolving to cloud-based  analytic databases and computation in the intermediate term. Given the company's experience with open source relational databases and its modest cash situation, we felt this solution made the most sense. In the end, the customer did too.

A final illustration reflects personal dissonance of whether or not to purchase a bulky digital SLR camera to meet the 10% telephoto needs my smartphone cannot accommodate. Happy as I am with the quality of the smartphone camera, I still long for an answer to zoom opportunities. Maybe I'll kludge a solution where I purchase the DSLR, but use the smartphone camera for 90% of my shots.

 

Contact us today to find out how Inquidia can show you how to collect, integrate and enrich your data. We do data. You can, too.

Would you like to know more?

Sign up for our fascinating (albeit infrequent) emails. Get the latest news, tips, tricks and other cool info from Inquidia.