According to 2012 ESG research, 44% of enterprise organizations (i.e., those with more than 1,000 employees) considered their security data collection and analysis a “big data” application while another 44% believed that their security data collection and analysis would become a big data application within the following two years. Furthermore, 86% of enterprises collected substantially more or somewhat more security data than they had two years earlier. (Source: ESG Research Report, The Emerging Intersection Between Big Data and Security Analytics, November 2012.)
The ongoing trend is pretty clear – large organizations are collecting, processing, and retaining more and more data for analysis using an assortment of tools and services from vendors like IBM, Lancope, LogRhythm, Raytheon, RSA Security, and Splunk to make the data “actionable” for risk management and incident prevention/detection/response.
Now I do a lot of consulting around big data security analytics with security professionals and vendors and these discussions tend to be focused “up the stack” on aspects of analytics applications. Sometimes the talks center on security analytics infrastructure like Hadoop, HDFS, Pig, and Mahout. Sometimes these discussions include things like UIs, visual analytics, application integration, etc.
Yup, everyone is interested in what big data security analytics applications do, but few if any people ever ask about the IT infrastructure foundation needed for big data security analytics. As a result, many organizations strike out when they can’t even collect the security data they want to analyze!
Collecting and processing gigabytes or terabytes of security data requires a bit of planning and deployment of big data security analytics plumbing including:
- Packet capture appliances. These appliances are made up of high-performance intelligent NIC cards from vendors like Cavium, Emulex, and Solarflare, disk drives, and PCAP software like Wireshark packaged together as appliances. These appliances need to be fast enough to capture and process packets for an assortment of analytics engines at wire speed. PCAP hardware appliances will appear throughout the network at critical junction points while virtual PCAP appliances will gain popularity in support of server virtualization and cloud platforms.
- Analytics distribution networks. Packet capture appliances collect and process the data but it still needs to be moved across a multitude of analytics engines in near real-time. This is the job of analytics distribution networks made up of devices from companies like Anue, Apcon, BitTap, Gigamon, Netscout, and Riverbed. In some cases, analytics distribution networks will complement packet capture appliances, in other instances, analytics distribution networks will provide lightweight PCAP functionality on their own. (Note: The industry term used to describe this is “network packet brokers” but this is way too device-centric for me so I came up with my own.)
- SDN. The SDN programmable control plane will likely become a poor man’s analytics distribution network but SDN is not likely to usurp analytics distribution networking equipment anytime soon. Rather, SDN will become part of the analytics infrastructure and complement PCAP and analytics distribution network functionality. Think of SDN and analytics distribution network integration introducing any-to-any connectivity between network data capture and analytics engines across an enterprise-wide analytics fabric (similar to a data center fabric a la TRILL and DCB).
- Analytics middleware. In many cases today, each analytics tool collects, processes, and routes its own data. This works fine for individual tools but introduces lots of redundancy, capital costs, and operational overhead. What’s needed here is some type of standards-based middleware for message queueing or publish-and-subscribe. For example, RSA Security uses open source RabbitMQ as common middleware between its analytics engines.
From an architectural perspective, it would be ideal to approach big data security analytics using a layered approach where the analytics engines are abstracted from the plumbing but could easily tap into them to customize security data collection, processing, and distribution. This would enable CIOs, CISOs, and network engineers to tailor their infrastructure, processes, and analytics engines, address their specific organizational and industry requirements, and manage capital and operating costs along the way.
In any case, there is a clear lesson to be learned here: You can’t collect, process, and route security data by simply connecting each analytics engine to a span port. To avoid this conundrum, CIOs, CISOs, and network engineers need to align their plans for big data security analytics with the appropriate plumbing.