Go back over two decades and the number one analytics "client" software was Excel. Excel first supported enhanced charting in Excel 3.0 way back in 1990. Go back over one decade, and despite the onslaught of the Web, Excel maintained its position as premier analytics client software, this time enhanced with pivot table capabilities available in Excel 2000.
Since 1990 Excel has closely mapped to the BI/analytics trend du jour--for example, OLAP gained popularity in the late 1990s, making Excel pivot tables pertinent. The original use case of Excel was as a sophisticated, scriptable calculator. Use case two became analytics tool and display engine for the masses, and even for the data analyst.
Big Data: The Era of 6 Vs – Volume, Variety, Velocity, Veracity, Visualization, Value
But the analytics game has changed yet again. Today we live in the era of "big data," and big data carries more demands and expectations: More data volume, more data sources and variety, data comes at us more quickly and real-time, information governance and quality thus are more important. Heretofore big data has often been coalesced into only 3 Vs (volume, velocity, and variety) and IBM likes to cite a fourth, veracity, which completely makes sense to me. Visualization, my 5th V, has become more sophisticated, more compelling, more integrated, faster. The analytics project or application that effectively deals with all of these 5 Vs, and is driven by analysts knowledgeable about business, has a good chance of delivering on the 6th and most important V, value, as in business value.
Excel 2013 and the 6 Vs
Given all of this change, will Microsoft stand pat, letting Excel fade into the background of analytics? No way. Excel 2013 will step up to several of these new era analytics challenges, primarily through enhancements to PowerPivot and PowerView, both of which are already available as add-ons to Excel 2010, but will come as part of native Excel 2013. Microsoft calculates that Excel 2013 will address 90% of users' analytics needs. I guess Microsoft is willing to leave the other 10% to data scientists of the Ph.D. variety. Take a look at a quick preview of the power of Excel 2013 analyzing some just completed Olympics data, and here is a quick synopsis of how Excel 2013 fares with the 6 Vs:
- Volume: Excel will use an embedded columnar, in-memory data management engine called xVelocity, which Microsoft claims can support hundreds of millions of rows of data. Assume, just for fun, 256 double-byte characters per row, and one hundred million rows, and we are looking at something well less than one-tenth of a terabyte, not exactly big data but certainly very large by spreadsheet standards. I think we could call this "pretty big data."
- Velocity: In Excel 2013, though you will not be able to use streams as a data source, you will be able to ingest many records quickly, and run complex queries rapidly. Thus, Microsoft does not step up to the "real-time" aspects of "velocity," but then again neither does native Hadoop. ESG expects Microsoft to address data streams in a later enhancement or later version.
- Variety: There are improvements here, but mainly on the structured side of data. The number of structured data sources grows seriously, well past SQL Server and Access, and it includes the ability to dip into the Azure Marketplace data as a source, plus a variety of other RDBMS and analytical data sources (Oracle, Sybase, Teradata, etc.). However, the ability to ingest and analyze unstructured and semi-structured, such as social and web sources, remains mainly outside the realm of Excel 2013.
- Veracity: Microsoft will add useful features on this front in Excel 2013, including something Microsoft calls Flash Fill which, using text analytics, allows for fast parsing of a dataset to cull out previously uncategorized data (e.g., extracting cities from a raw address text.). They could have named it "create and fill another column really fast and easily." Hmm, maybe Flash Fill is better. Microsoft also added data modeling capabilities to allow for mapping and some basic transformation features. Some might place this feature under "visualization" but to me these capabilities focus on understanding the data you are working with before the analysis. Finally, Excel 2013 will add auditing and tracing features often found in e-Discovery solutions. Does Excel 2013 replace full, enterprise-class data integration and governance features you might find from Informatica or IBM, or e-Discovery you might find from Symantec? No, but Excel 2013's "veracity" features may be all you need for many analytics sandboxed projects.
- Visualization: Excel 2010 with PowerPivot and PowerView add-ons have already taken Excel to previously unforeseen sophistication in terms of visualization, and the 2013 version, particularly in terms of PowerView, will add another long step. Drilldown and up are supported not only in the Excel 2013 version, but in an HTML5 rendered version which can be viewed from Sharepoint from any broswer. Some subtle but visually powerful features, such as hyperlinks, background images, and KPIs have been added. There is also a chart recommendation engine, which looks at the data you want to visualize and suggests the best ways to view the data.
- Value: "Value" is perhaps the most difficult V to quantify of the 6 Vs, but given that you will pay nothing beyond that standard Excel license (or more likely an Office 2013 license) for these 5V feature enhancements, Excel offers the data analyst great value for the money. In terms of turning Excel analytical features into something of value for a data analyst's organization, well, that is up to each data analyst.
The hype, promise, and in many cases, delivery-on-the-promise of big data, with Hadoop in the middle, has turned the analytics supply-side on its ear. Proving value by optimally managing data volume, variety, velocity, and veracity, and offering insights with compelling visualization, make up the mantra of the 6 Vs of big data. Microsoft has turned Excel 2013 on its own analytics ear in order to remain pertinent to the data analyst, and maintain its role as the de facto most popular analytics client.