"Loaded up and trucking" is the phrase that comes to mind when I reflect on last week's Spark Summit in New York City. There is a ton of activity and momentum around Spark as the focal point for big data and analytics. While this conference was still relatively small, it was clear attendees were keen to get their heads and hands on the latest code. While readily apparent that this is a relatively young technology space, that youthful energy and creativity is rapidly overcoming the rough edges. I only wish it felt a bit more mature, but we're getting there.
Take a look at the video above for my "man on the scene" coverage.
Female: The following is an ESG On Location video.
Nik: Hi. I'm here at the Spark Summit East in New York City. It's been a very interesting show so far. We've had more lines of code on keynotes than I've seen in some time at a tech conference. I'm not sure that's a good thing. It suggests to me this is still early stage, still complicated and difficult to do. I've seen a lot of people discuss here how do they do complex things related to governance. How do they address architecture? There's still a lot of questions. But the nice thing is it's been well-grounded in the use cases.
We've had Capital One, we've had Ebay, we've had people articulate how do they use the technologies to do the things they want to do with Spark. They want to do streaming. They want to do machine learning. They want to use SQL or OLAP and be able to work with their Hadoops, or they want to be able to do graphing, all these different functionalities. People are really here to understand how do they solve complex technical problems, but apply it to a business use case.
Something else we've noticed is there's some technology shifts going on, but there's some environmental shifts at the same time. People moving away from a pure bare metal deployment of big data, and looking at virtualization, running on VMs, using Docker to contain their applications for analytics, or even parts of their big data platform. We're also seeing that shift happen towards cloud, and I think Databricks has been vindicated here, as people are saying, "I don't want to have to worry as much about building the underlying infrastructure, I want to focus on how do I build my data platform, how do I set up my analytics." And of course, other cloud players also active in this space, people are using Amazon extensively, Microsoft. Maybe not so much the high added value services that we would have expect, but give me an environment where I can set this up, I can be elastic, maybe more secure than I am on premise, maybe more available, but certainly quicker time to value.
A lot of the news here has been around Spark 2.0, obviously the next major iteration in the technology platform. I think one of the key pieces is that they want to make streaming just work for people, not have to think about how to re-engineer my application to work in a streaming environment, but really make it transparent so that the data scientists, data engineer can focus on getting the value, not rewriting, redesigning everything. Now alongside that, Spark 2.0 has a new API foundation coming out. They're looking at how do they better link up Kafka, file systems, databases, and other extensions to the platform that make it more consumable.
At the same time, we're seeing other technologies still emerge. Apache Arrow's a big deal, bringing out in memory framework to be able to do columnar databases, but use that across different technologies, different platforms in the environment. And even Google's starting to make inroads. I've heard many references to things like Google Dataflow, Google TensorFlow. Interesting to see that player not just be looked up to as somebody who can do it well, but somebody who can deliver technology out to the field.
All in all, it's been a really great show. A lot of momentum, a lot of interest in Spark. In two years, Spark has caught up to where Hadoop is. I think it's really interesting to see the growth and the new outreach going on.