I attended the Network Automation Meetup in San Francisco. The topic was Practical Infrastructure as Code, and was presented by Matt Stone of Brocade, and was hosted at the Cumulus Networks’ offices and the food and refreshments were provided by Hewlett Packard. In the world of meetups, all parties are friendly even though they compete commercially. Matt said he was not an official spokesman for Brocade, but I believe many of his views are aligned with what Brocade does in their New IP initiatives.
The topic was how to treat the management of infrastructure with the methods used for managing code. The cycle consists of 1) Build, Test & Validate, 2) Deploy, 3) Monitor & Remediation and 4) Source & Revision Control, as shown in the diagram to the right.
This ties the automation of the infrastructure via scripts with tools that are designed for conventional programming. The tools he described are commonly found in open source-based tool kits (there are some commercially supported versions too). His point was that these tools used to be hard to use and were unreliable, but have come a long way in the last few years to make them usable for network automation use.
He discussed the cycle with the following phases. (Some of this consists of a laundry list of apps, but that may be useful for beginners.)
Source and Revision Control
Git, Subversion (SVN), Mercurial, Gerrit and Phabricator were mentioned for this area, with the expectation that the operations team can examine automation code and perform peer reviews on it prior to being rolled out. This is all fine and peer review improves code quality and realiability.
Rather than using CLI, or actual programming languages, the code was based on scripting systems like Ansible, which is more readable by network admins.
Build Test and Validate
Again, tools used for continuous integration like Jenkins, Travis CI, Parallel CI and ServerSpec were discussed, but Matt said that this is is an area that still requires work. The fear is that not everyone has a test environment to spare, and it is unlikely to mirror what may be in production. This is a classic issue with any code, and it's possible to create some test environments with some VMs that run network appliances, but are often far from what would be found in practical deployment.
The deployment tools are the common DevOps tools like Ansible, Puppet, Chef, and even expect scripts. Matt emphasized that we ought not to scoff at expect, since its ability for CLI scraping is almost certainly going to work as it has been debugged over many years. He also threw in Neutron as a programmable interface for OpenStack environments.
Monitoring & Remediation
This is an area that is most interesting. There are many tools such as Sensu, InfluxDB/Grafana, Fluentd/LogStash, ElasticSearch/Kibana and StackStorm. These are data analysis and visualization tools that offer a dashboard for network status. Here's a screen shot he shared for Kibana
Kibana screen shot
Is it for me (an enterprise user)?
All of this is nice to see, but there’s a balance of utility versus effort that I want consider. As Matt said, the tools have become more stable, and some, such as git are mainstream development tools. The use of development and CI concepts in infrastructure management is interesting, although in a production context, it’s a bit more difficult to push out changes very frequently.
Although the tools are getting good individually, I wonder how difficult it is to perform the following:
- Integrate. The glue that makes it fit all together may be non-trivial. These individual pieces are not necessarily tested to work with each other, and are rapidly changing so incompatibilities may creep in. I suspect that unless the organization is a large enterprise or service provider, there may not be a big enough staff to keep them up to date and tested.
- Scale. What works in a test lab may not work in large scale production. Systems like ElasticSearch has clustering capability for scaling horizontally, but I’m not sure if all the other components that integrate with it can handle the data.
- Support. For end-users who are looking to experiment with these tools, using these free tools is a fine thing to do, with shouts of “knock yourself out,” but be forwarned that you’re on your own if you run into problems later on. Some of the very same open source tools companies do offer commercial editions or cloud hosted SaaS versions, so that may be a safe option.
There also are commercial analysis and visualization products, including those from startups such as Kentik Detect that is designed for nework visibility at scale, and established vendors such as NetScout, Riverbed or Savvius have numerous tools that provide visibility. Automation via scripting is still in its early stages but APIs that support scripts are available from mainstream networking equipment vendors and startup disaggregated switch OS vendors like, so the underpinnings have become available.
What I recommend is that if you have little or no experience with network automation, using these free tools are a great way to start, and you may learn a lot in the process.
But we forewarned that an enterprise, network administrator's goal is to design, build, deploy and operate the network, and putting too much effort into coding will start to be a distraction and a time sink, and definitely may increase OpEx costs compared to using an off-the-shelf product. I recommend that you time box your efforts. You may be good and lucky and get a setup running in no time. Otherwise, you can spend time with no end in sight on getting it working and there is an opportunity cost to be paid.
Different areas of application monitoring and infrastructure monitoring have started to blend together. and products such as Riverbed's SteelCentral has become a unified suite that provides end-to-end monitoring for networks and apps. My prior blog described some monitoring tools used for a Kubernetes & container based infrastructure which may assist in your efforts.