In Closing: Four Strata Winners Address Sort and R

In a previous blog I fashioning-hadoop-distributions-for-the-enterprise-strata-sphere/index.html" target="_blank">examined the new Hadoop distributions that were unveiled around the time of the Strata Santa Clara 2013 conference. Not wanting to give short shrift to other announcements around the conference, here are four other winning and/or particularly creative unveilings. Note that not every related announcement was listed on the Strata Santa Clara 2013 home page, and not every announcement came out through a press release. My winners had to do with two subtle but fundamental subjects in the big data space, sorting, and R. Without further ado:

SyncSort: Is sort important to Apache Hadoop in particular, and to big data in general? Sure it is, and despite its mainframe roots, SyncSort unveiled its contribution to Hadoop 2.0.3 alpha in the form of API-level, optimized sort routines to speed along development and runtime functions like MapReduce. If you have any doubt that Hadoop has edged into the mainstream, look no further than SyncSort.

Vertascale: On the other extreme of vendor age from SyncSort is Vertascale, a finalist in the Strata Startup Showcase, that unveiled its beta optimized search/indexing option for Hadoop. Citing the query slowness of Hadoop, which has been evident in ESG’s own lab tests, Vertascale promises a 1000x query speed improvement in the hope of propping Hadoop’s native Pig and Hive. In terms of identifying pain points in the Hadoop platform, Vertascale is right on point.

Teradata: Just before Strata, Teradata took the wraps off of its latest Aster Discovery Platform release, which for a point release included a particularly long list of important enhancements. I was most taken by the tighter integration of R, whereby the world’s most popular open source language for big data statistics development now runs directly inside of the Aster database – why shouldn’t Teradata tap into the growing pool of R aficionados? Also, Aster MapReduce functions are now available through a visual user interface, which should make life far easier for code-shy data scientists and analysts. And MapReduce functions were added and augmented for two key verticals, manufacturing and financial services.

Revolution Analytics: Speaking of R, the leading analytics specialist in the industry promulgating R for the enterprise, Revolution Analytics, announced yet another front in its endeavor, this time in conjunction with HortonWorks. This effort, individually, does not change the game for R, but it is yet another proof point that Revolution Analytics is reaching into many major big data platforms, including multiple Hadoop distributions and proprietary commercial platforms. Bottom line: If you believe that R will continue to gain seats in commercial organizations, which I do, you have to believe that Revolution Analytics is one of the best-positioned of the vendors in the big data community. And the company isn’t that young any more, having started in 2007, and has crossed the credibility threshold.

Next up on the big data west coast spring conference will be the I.E. Group’s Big Data Innovation Summit in San Francisco on April 11th and 12th. See many of you there.

Topics: Data Platforms, Analytics, & AI