What is this C*, yet another derivative of the C programming language? No, C* is the insider's short-cut for Cassandra, the Apache top-level open source database project that has gathered enough steam to draw over 800 attendees, up from 125 two years ago, last week to Santa Clara for the third annual Cassandra Summit, hosted by DataStax. Here's a nice synopsis of the summit. For those of you not yet initiated, however, what follows summarizes the open source Cassandra movement and the DataStax commercially supported version of Cassandra.
C* Roots and Off-shoots
The Cassandra open source database project, a mere 4 year old, traces its developmental roots to Facebook search, and uses Google Big Table and Amazon's Dynamo for its architectural foundations. Currently at stable release version 1.1.3 as of August 5, 2012, Cassandra offers a NoSQL, hybrid columnar-row, key-value store, write-optimized database. Current estimates count over 1,000 Cassandra clusters in deployment, with the DataStax commercial version counting around 200 customers in production.
The no longer so small, loyal and growing set of C* believers see several solution use cases where Cassandra offers clear benefits in terms of development, speed, scale, and cost over RDBMS. The do-it-yourselfers can download Cassandra open source database, and enterprises looking for management tools and support can opt for the DataStax Enterprise commercially-supported Cassandra.
The Cassandra project and the Apache Hadoop project (big data analytics) are often linked, and indeed several production examples of Cassandra being used in a big data context, notably by Disney, were on display at the summit. Cassandra excels at real-time analytics, scale-out scenarios, and mixed workloads (see eBay and commodity trading platform cases). In addition, Cassandra combined with Apache SOLR, supported through a commercial DataStax distribution, yields a search application use case. SOLR is an an open source search sub-project of the Apache Lucene project.
Cassandra/DataStax Sweet Spot?
ESG sees Cassandra as particularly appealing for scale-out, high availability (HA) scenarios, for both Web 2.0 companies (e.g., eBay, Netflix), but also Enterprise 2.0 companies (Disney). By Enterprise 2.0 I mean companies born from the brick and mortar era who have, will, or are in process of adapting their business models to be more Web 2.0-like.
What makes Cassandra most appropriate for these scenarios are three primary features, including, (1) the lack of a master node--it uses a peer clustering and replication techniques to ensure HA--(2) support for a multi-data-center approach--it handles a massive number of nodes and thus is disaster-recovery friendly--and (3) Cassandra comfortably operates in mixed mode, cloud, and on-premises. Also, Cassandra offers its own limited function focused query language known as CQL, Cassandra Query Language, which eschews joins and sub-queries in order to optimally meet real-time requirements
Fueling the Movement: 2.0, Mobile, Margins and Comfort
ESG believes that while there are a long list of options for databases to support BI and analytics in this era of Big Data, the scale-out/HA database opportunity has entered a new phase of innovation, with a growing list of fresh market entrees. Often bunched under the NoSQL and/or NewSQL categories, these drivers are responsible for this burst of innovation:
- Enterprise 2.0 Business Models: Nearly all enterprises must become Web 2.0 companies, even the oldest brick-and-mortar companies on the planet. Take a look at the recent online buying results for Amazon and eBay, which run well in front of brick-and-mortar companies. With smartphones and tablets increasingly being used for both virtual window shopping and POS, the structure of business will remain in a high state of flux for the foreseeable future. That means the world's largest and more important databases will continue to experience exposure to often unanticipated peak loads. When Web 2.0 becomes THE business or a significant portion of it, the need for highly distributed, performant 24x7x365 databases become tantamount.
- Gross Margins Galore: Oracle, IBM, and Microsoft have held sway as the tier-1 enterprise database oligopoly for over 15 years now. Add SAP/Sybase and Teradata into the mix, and include the major variants of these 5 vendors, and you easily have covered 90% of enterprise database market share. But nothing lasts forever, and when the market leader, Oracle, reports gross margins at 80%, and EBITDA margins at 45%, and with the history of its gross margins for the past five years spanning a tight range of about 72-82%, innovators, and price sensitive customers, undoubtedly notice an opportunity for disruption. ESG assumes the other leading enterprise database providers enjoy similarly healthy gross margins for their primary RDBMS offerings, typically in excess of 50%.
- Proven "Commercial-Open Source:" Enterprises no longer thumb their risk-averse noses at open source, particularly if there is a commercially supported version. Red Hat Linux remains the exemplar, but Oracle and IBM also have long histories of driving and/or supporting open source-based technologies. DataStax has their hands on the commercial open source baton for Cassandra, offering management/operations software, plus full training, implementation, and support services for their DataStax Enterprise version. They also offer free DataStax Enterprise for Startups and DataStax Community editions that include open source versions of its DataStax OpsCenter software.
Will Cassandra/DataStax become the Linux/Red Hat of Enterprise 2.0 NoSQL databases? It is easy to point to several successes at the intersection of community and commercial: Linux, Java, MySQL, and Eclipse easily come to mind. The next two years should prove most dynamic for Cassandra, DataStax, other NoSQL/NewSQL providers - and most interesting for the established enterprise RDBMS leaders.