Without a doubt the 2012 winner in big data databases is HBase, the open source db used natively when you use Apache Hadoop. However, the goal here is to pick the commercial winners, disqualifying HBase. We need instead to pick from the pool of commercial Not Only SQL databases, but as this infographic suggests, Not Only SQL databases do not only deal with big data analytics use cases.
What is misleading about the infographic, however, is that it suggests that the databases depicted are only good for the particular segment to which they have been assigned—not true! While it is clear which databases ARE used primarily for big data analytics (the MPP/Analytics and Graph databases), other databases may also be used for big data, either directly, or as close cousin. In fact, two of the three finalists for big data database of the year do not originate from analytics database categories:
MongoDB is a phenom. "MongoDB" quivers on the lips and at the fingertips of passionate entrepreneurial developers more than any other database. The community development and market education magic created by 10gen is today's equivalent to the rush of interest which created the mySQL phenomena a decade ago. On the surface, MongoDB seems primarily positioned for JSON-powered content/document management use cases, but the number of users trying MongoDB as a real-time database cousin to Hadoop's batch capabilities for big data purposes has grown rapidly in the latter half of 2012. From what we can tell, the number of actual production big data use cases remains in the minority of implementations, but the footprint of MongoDB at least at the periphery of the big data community expanded considerably during latter 2012.
Some big data database cognoscenti sniffle at the success of DataStax, because DataStax offers the commercial version of the Apache Cassandra database, an alternative to the more directly aligned with Hadoop HBase. But DataStax has proven that with focused community development plus enterprise-class support, go-to-market and channel development they can often trump Hadoop technical purity. DataStax exhibits a rapidly growing commercial and enterprise-class customer base to show for its efforts. Though DataStax isn't only used for big data, it often sits next to and interoperates with other analytics databases, and it has rapidly become a utility knife database for big data and search—with more of an enterprise angle than MongoDB.
SAP HANA has changed the conversation in the realm of enterprise databases, and, despite SAP's marketing efforts, the word that comes to mind when "SAP" and "big data" are mentioned together is not "in memory" but "HANA." After over a decade of making database bets, and seeing Oracle outflank them, SAP has finally engineered a real threat with HANA. During 2012, HANA was focused on SAP-oriented BI/analytics, but SAP just announced that HANA will be available for the OLTP-use case of SAP Business Suite in February. HANA, like EMC’s Greenplum and Teradata's Aster, is a full-bodied analytics platform, but the HANA database claimed more notoriety in 2012. During 2012 HANA often won in SAP app shops looking to move forward with big data projects, beating some very well-known relational databases. SAP carefully constructed a select set of resellers all with a huge channel presence primed for HANA, and now offers HANA through Amazon Web Services. ESG’s thinking here, though, is that with HANA’s repositioning towards “OLTP plus OLAP,” HANA’s fresh focus on transactions and operational BI suggests less focus on big data style advanced analytics—though only the market will tell the full story over the coming couple of years.
Winner:10gen MongoDB’s proliferation and momentum in 2012 was unmatched, across a variety of use cases including but certainly not limited to MapReduce and analytics. We aren’t the only ones impressed, see 10gen Announces Strategic Investment from Intel Capital and Red Hat
Big Data Database in 2013: Thanks for the Memories
Intel must be licking its chops. Despite Oracle’s attempt to push SPARC, all the other big data analytics databases were either designed to take advantage of Intel multi-core processors and the lowering cost of memory, or are being back-fitted to do so, or wrapped in appliances that at least emulate "database on a chip" to some degree. This trend shows no signs of abatement for 2013, and new database versions will march out better leveraging core/memory, resulting in yearlong game of "my database is faster than yours for big data." And with good reason: As big data analytics shifts from batch to real-time, from a nice-to-have to a must-have, the delivery of complex analytics results in as short a time as possible will gain in desirability. Therefore, ESG believes that the database primarily designed for analytics will increase the gap over more general use-case databases. Those MPP and graph analytics databases, at least in the enterprise, will gain share in 2013.
Speaking of Oracle, despite their memory/core optimized engineered systems, ESG believes Oracle will either need to up its investment and profile of the Oracle NoSQL database, or make a technology acquisition to offset the analytics-focused competition. IBM may be in the same situation—some of its non-purely-relational offerings have grown long in the tooth, and they may have to bet on a latter generation analytics database through acquisition. Microsoft SQL Server 2012 Column Store, however, will likely receive more channel education, push and press for big data use cases; in the MISO oligopoly (Microsoft, IBM, SAP, Oracle), MSFT and SAP are under less pressure to address big data database technology, but certainly Microsoft needs to do a better job of getting the word out.
But just as the big-data-hardware-2012-winners-and-2013-outlook/index.html" target="_blank">Big Data Hardware blog from yesterday ended with a discussion of threat coming from Amazon Web Services and cloud, we must again introduce Amazon DynamoDB into the analytics database discussion for 2013. While Amazon Elastic MapReduce already offers a cloud-based Hadoop implementation, using HBase, there is no reason to suspect that DynamoDB will not be pushed more towards a central role for analytics purposes in 2013. While we believe that DynamoDB tastes more palatable to the enterprise analytics developer, the Web 2.0 crowd has also jumped on Google BigQuery for analytics purposes, with some of the leading edge types from enterprises also showing interest.
Given ALL the databases out there for big data analytics, whether on-premises, on-appliance, or in-cloud, ESG believes the market is quite saturated, and certainly confusing for the overwhelmed enterprise DBA trying to support big data projects. Therefore 2013 is the year for MISO, and we would include Teradata Aster and EMC Greenplum in this mix, to provide comfort and assurance that the DBA can stick with her/his established provider and not sacrifice the state-of-the-art by doing so.