IT Chasms, Gaps, and A New World Order

I went to dinner last night with my pal, EMC big wig Rich Napolitano, and a startup he knows called Plexxi. I've known Rich for many years since his startup Pirus (acquired by Sun for way too much money, god bless them).

Now Plexxi is still in stealth mode so I won't unwrap them yet, but suffice to say they are entering the world that I love - an enormously disruptive ($$) market ripe for inevitable change (the networking space) because of powerful, long-term secular trends (that they didn't have to create). All the pieces required for mega-change.

I can't say that Plexxi will be the next VMware, Facebook, or other smash, but at least they are smart enough to make sure the market they went into has the characteristics that make a mega-outcome possible. Most don't.

Which gets me to the point du jour - we still fundamentally architect our data centers as if it were 1984 - as if we still run client/server, with 18 servers. Our networks are all designed to be massively interconnected (and OUTRAGEOUSLY expensive) Core-based, with multiple layers beyond that all connect together. We connect everything to everything - thus requiring the same over provisioning and under utilization generally that we've done with servers and storage historically. We use VMware to eliminate that issue with servers (theoretically), and we use all sorts of techniques to do that with storage (thin provisioning, QoS, etc.) but we still brute force our networks.

Our networks were designed to deal with the internet, not the data center.

Our data centers don't have 18 servers anymore. They have 18,000. All connected together, even though consistency/application related groups probably only require 10-20 to actually be connected together at any given time, in order to optimally execute a function. By necessity, when you connect them all, you create dependencies and overhead - when you get to 18,000, the overhead is absurd.

It's why Cisco can't wait for you to get off the dime and start upgrading to 10G or 40G - which you have to do in this type of architecture - to support 18,000 machines connected together. It's a HUGE payday for them - and it sucks bad for you.

Thus, if we are ever to reach that magic nirvana state of truly liquid IT (utility, fluid, etc.) where resources are groups, stood up, and torn down on demand and optimized for the function they are required to perform, you can't keep networking like it's 1984. There has to be a better way.

The new world order is just that - when you are forced to deal at scale, you simply can't deal with architectures that were never intended to be used that way. We are force-fitting old thinking into new requirements, badly. It can't end well.

The Gaps:

I've spoken previously about the forced tension IT deals with between different motivational factors between Infrastructure/Ops groups and Apps/Business groups. In short, the Inf/Ops team is motivated to do more with less, get better utilization, share stuff, keep the lights on, etc. The Apps group doesn't care about any of those things. They care about cycle time - the time to get an app developed, tested, deployed, and supported. Anything that slows down that cycle is bad. Anything that "might" slow down that cycle is bad. Having shared infrastructure with moving workloads represents a clear and present danger to the Apps group. They will shoot you if you put something on their stack that causes a problem. Thus, they have always favored stovepiped stacks. Keep your hands off of my stack, to quote my pals at Pink Floyd. (I'm pretty sure that IT is exactly what they were referring to).

Anyhow, the gap is widening and the tension is forcing bad decision making. There is a reason why the #1 corporate user on Amazon are developers. They don't want to wait 4 weeks for Inf/ops to stand up a development stack for them. They can do it in 4 minutes on EC2/S3 (at an exact known cost). I'm not advocating it (au contraire, it causes compliance NIGHTMARES) - but I understand it. It's their motivation. Shorter cycle time has absolute direct bottom line impact on the business.

Meanwhile, back in Operations City, this melange of virtualization enablement is causing - for the first time in DECADES - stovepiped infrastructure functions (aka Chuck in storage, Bill in networking, etc.) to actually come out of their cubes and TALK to each other. Why? because there are monster dependency factors now - no one can operate in a vacuum anymore. I believe this is not only good for those folk, but good for society in general. We've lost the art of actually looking at someone when we speak. We'd rather text them.

I digress.

So Chuck needs to know things about networking, and servers, and virtualization now. He can't just know how to twiddle the bin file and expect to remain a valuable member of the staff. You HAVE to know more about what goes on around you because that's simply the reality of the day. Otherwise, the virtualization admin will start to listen to VMware and sooner or later start controlling all your knobs from their console. (VMware has really interesting networking and storage stuff built in, that no one uses yet....uh-oh....).

The IT vendor community has an opportunity to either help or suffer the same fate as Chuck if he doesn't want to play nice with others - be eliminated eventually. IT vendors LOVE to act like they speak the language of others, but they don't. And they sure as shit don't speak the business/apps language, no matter how much they tell you they do. They sell knobs to guys in the basement who like to twiddle knobs. Specialty knobs to knob twiddling specialists. Both used to be very valuable. Now they are about to be extinct. Like the buyers, the vendors need to learn the interdependencies of their stuff in the new world order, and help their user cross that chasm. Lest they end up sad and lonely.

Which gets me to the biggest chasm, and opportunity, of all. Those IT pros, and associated vendors, who can help bridge the gap between Inf/Ops and the Apps/Business have the greatest opportunity for success. Inf/Ops is the tail of the dog - they get requirements YEARS after the core business decisions are made. They fight for scraps. If you can insert yourself into the process earlier, you have a better chance of a favorable (and more valuable) outcome.

The bigger truth is telling a storage buyer that your stuff is awesome because he can go faster running VMware is cool, but telling the App owner that your storage features will enable them to cut test and Q/A time by 30% is where the money is. Make your network choice based on application requirements first - not last, and you'll find you have a whole new (and more powerful) set of friends willing to talk to you.


You can read Steve's other blog entries at The Bigger Truth.

Topics: IT Infrastructure Networking