“R” Your Eyes Open to Open Source Visualization?
The question was raised the other day: Has anyone seen much in the way of open source visualization tools? The answer is yes; they are out there if you look hard enough, but only one has appeared that has impacted the market to the extent that we have seen other segments impacted by open source, such as database, operating systems, virtualization, and browsers. Here’s a quick snapshot of open-source visualization:
- Where is the open source foundation support? Most of the world’s leading open source project foundations have not addressed visualization much, if at all. Apache facilitates over 100 top-level projects, but none of them clearly focus on visualization. By comparison, Apache facilitates many database projects, like Accumulo, Cassandra, CouchDB, Derby, and Hbase. While Hive offers analysis and ad-hoc query support, closely associated with Hadoop, it barely qualifies as a visualization tool. For visualization, you will find only one organization doing the equivalent to The Linux Foundation, or the KVM project, or Mozilla — see the final bullet.
- Inexpensive and loyal commercial offerings. If you are truly strapped for cash, with a few add-ins for Excel you can power-up its visualization capabilities. But if you want to take the next step up, there are fresh market visionaries, associated with but not limited to Hadoop, such as Datameer and Tableau that offer inexpensive personal editions, and enable you to grow into workgroup and enterprise licenses. Just as Datameer and Tableau include more than just visualization, other emerging analytics platforms that may not break your bank, such as Karmasphere and Pentaho include strong visualization capabilities, and JasperSoft offers an open-source community edition of its BI/analytics tool with some visualization features. BI/analytics was born well before Hadoop was born, and thus there are plenty of longstanding analytics tools that offer rich statistics plus visualization, and possess strong loyalty, such as IBM Cognos, IBM SPSS, SAS, and SAP BusinessObjects. While deep-market penetration has never stood in the way of open source, the combination of modernized offerings from long-standing analytics suppliers plus the inexpensive choices from new commercial entrees have squeezed some of the demand out of open source visualization, but not entirely.
- R You Good Enough? Another compelling early stage analytics/visualization provider, Revolution Analytics, takes full advantage of the world’s most popular open source statistical environment: The R-Project, more often referred to simply as “R,” is officially part of the Free Software Foundation GNU. Revolution Analytics noticed that a large number of data analysts (two million claimed) use or have used R at some point. So Revolution Analytics beefs up the UI, framework, and services around R, aiming to be to R what Red Hat is to Linux. Plenty of university courses teach and/or require the use of R. R even has its own annual conference, running back to 2004, with a fair amount of commercial support. The final exclamation point illustrating R’s impact comes from Oracle, who along with Cloudera’s Hadoop Distribution and Cloudera Manager, packages Oracle R in its Big Data Appliance.
If you do a web search for “open source visualization” you will discover a fairly rich list of .orgs with a primary focus on a particular sub-industry or class of statistics or visualizations. In general, however, statistical algorithms, visualization options, and the ability to script/program scenarios all walk hand-in-hand, and on the open source front, R offers the richest set of those capabilities. In addition, R has the widest user base, and some committed commercial aficionados. Thus, the simple answer to the original question is, “Yes, R!”