This report documents a 2017 ESG Lab audit and validation of Cisco HyperFlex hyperconverged infrastructure (HCI) performance testing, which focused on comparisons of Cisco HyperFlex hybrid and all-flash solutions with anonymous competitive HCI solutions.
Note for 2018 updates made to this report: Test results remain unchanged from the original 2017 report, but some features described in the HyperFlex section have been updated to reflect the current feature set included in the HX 3.0 release.
Organizations today must be extremely flexible, with the ability to add applications and virtual machines (VMs) quickly to handle the speed of business. This is extremely difficult to achieve with silos of compute, network, and storage gear that are static and require individual management. This is one reason for the popularity of hyperconverged infrastructure (HCI). HCI offers a single, centrally managed unit with software-defined compute, network, and storage that is flexible, scalable, and easy to deploy.
ESG research continues to confirm the popularity of HCI: In a recent study, 57% of respondents reported that they currently use or plan to use HCI solutions.1 This is not surprising given the factors driving them to consider HCI. Deployment drivers most cited by respondents include improved scalability (31%), total cost of ownership (28%), ease of deployment (26%), and simplified systems management (24%). This isn’t the most interesting statistic, however. As detailed in Figure 1, 68% of organizations are running more than 20% of their production workloads on hyperconverged technology solutions today and 31% of organizations are running more than 30% of their production apps on HCI.2
As these numbers increase, they bring with them challenges, and performance still sits near the top of the list. Simplicity is no longer the only priority; as more HCI solutions have come to market, the key buying criteria have expanded to include performance. Many solutions still cannot deliver the consistent high performance that mission-critical workloads demand.
Cisco HyperFlex is a complete hyperconverged system that combines compute, network, and storage in a fully integrated, fully engineered platform designed to scale resources independently and deliver consistent high performance. Cisco HyperFlex is engineered on Cisco UCS, combining the benefits of the UCS platform (such as policy-based automation for servers and networking) with those of the HX Data Platform’s distributed filesystem for hyperconvergence.
It supports edge-to-edge workloads from mission-critical core data center applications to remote locations. The latest update adds support for Microsoft Hyper-V in addition to VMware ESXi along with support for bare metal, multi-cloud, and containerized environments. HyperFlex deployments require a minimum three-node cluster for high availability, with data replicated across at least two nodes, and a third node to protect against single-node failure.
HyperFlex HX-Series Nodes are powered by the latest generation of Intel Xeon processors, and comprise:
- Cisco UCS servers. Both blade and rack servers can be combined in the cluster, with a single network hop between any two nodes for maximum east-west bandwidth and low latency. HyperFlex lets you alter the ratio of CPU-intensive blades to storage-intensive capacity nodes so users can optimize the system as application needs shift. All-flash and hybrid nodes are available.
- Cisco HyperFlex HX Data Platform for software-defined storage. Operating as a controller on each node, the HX Data Platform is a high-performance, distributed file system that combines all SSD and HDD capacity across the cluster into a distributed, multi-tier, object-based data store, striping data evenly across the cluster. It also delivers enterprise data services such as snapshots, thin provisioning, and instant clones. Policy-based data replication across the cluster ensures high availability. Dynamic data placement in memory, cache, and capacity tiers optimize application performance, while inline, always-on deduplication and compression optimize capacity.
- The HX Data Platform handles all read and write requests for volumes accessed by the hypervisor. By striping data evenly across the cluster, network and storage hotspots are avoided, and VMs enjoy optimal I/O performance regardless of location. Writes go to local SSD cache and are replicated to remote SSD in parallel before the write is acknowledged. Reads are from local SSD if possible or retrieved from remote SSD.
- The log-structured file system is a distributed object store that uses a configurable SSD cache to speed reads and writes, with capacity in HDD (hybrid) or larger SSD (all-flash) persistent tiers. When data is de-staged to persistent tiers, a single sequential operation that writes a large amount of data enhances performance. Inline deduplication and compression occur when data is de-staged; data movement happens after the write is acknowledged so there is no performance impact.
- Cisco Unified Fabric /UCS 6200 Fabric Interconnects enable software-defined networking. High bandwidth, low latency, and 40Gbps and 10Gbps connectivity in the fabric enable high availability as data is securely distributed and replicated across the cluster. The network scales easily, and each connection is fully secure. The single hop architecture enhances cluster performance.
- Cisco Application Centric Infrastructure (ACI) for automated provisioning. ACI enables automation of network deployment, application services, security polices, and workload placement per defined service profiles. This provides faster, more accurate, more secure, lower cost deployments. ACI automatically routes traffic to optimize performance and resource utilization and re-routes traffic around hotspots for optimal performance.
- Choice of industry-leading hypervisors including VMware ESXi and vCenter as well as Microsoft Hyper-V. The hypervisor and management application come pre-installed, providing a familiar management interface for all hardware and software.
Cisco HyperFlex delivers numerous benefits, including:
- High performance. In addition to the performance features mentioned above, HyperFlex securely distributes data across servers and storage in the cluster to reduce bottlenecks.
- Fast, easy deployment. This pre-integrated cluster can be deployed just by plugging into the network and applying power. Node configuration and connection is handled through Cisco UCS service profiles. Cisco says that customers report typical deployment times of less than one hour.
- Consolidated management. Systems are monitored and managed through Cisco HyperFlex Connect or Cisco Intersight, which eliminates separate management silos for compute and storage. HyperFlex Connect lets organizations manage and monitor clusters from anywhere and at any time with metrics and trends to support the entire management lifecycle. Intersight is an optional cloud-based platform that allows users to manage all their Cisco HyperFlex and Cisco Unified Computing System (Cisco UCS) infrastructure including traditional, hyperconverged, edge, and remote/branch offices through a single cloud-based GUI.
- Independent scaling. Different from other HCI systems, HyperFlex can independently scale compute and storage by adding or subtracting either servers or individual drives; data is automatically rebalanced. This provides the right resources for different application needs, instead of scaling in predefined node increments that add software licensing costs.
ESG Lab Validation
Testing was conducted using industry-standard tools and methodologies and was focused on HyperFlex hybrid and all-flash performance with comparisons to unnamed alternative solutions. These solutions included two “software-only” systems from leading vendors that leveraged standard x86-based servers, and a proprietary system from a single vendor based on its own hardware and partially integrated with its own software.
The bulk of the testing used HCIBench, an industry-standard tool designed to test the performance of HCI clusters running virtual machines. HCIBench leverages Oracle’s Vdbench tool and automates the end-to-end process that includes deploying test VMs, coordinating workload runs, aggregating test results, and collecting data.
This extensive testing was executed using a stringent methodology including many months of baselining and iterative testing. While it is often easier to generate good performance numbers with a short test, benchmarks were run for long periods of time to observe performance as it would occur in a customer’s environment. In addition, tests were run many times, never back to back but separated by hours and days, and the results were averaged. These efforts add credibility by reducing the chances that results were influenced by chance circumstances. Also, testing was conducted using data sets large enough to ensure that data did not remain in cache but leveraged the back-end disk across each cluster. 3
Testing of hybrid solutions included both SSD and HDD. The hybrid test bed included a four-node HyperFlex HX220c cluster with one 480GB SSD for cache and six 1.2TB SAS HDDs for capacity. Tests were run with 140 VMs (35 VMs per node), each with 4 vCPUs, 4 GB RAM, and one 20GB disk, and running RHEL version 7.2. The working set size was 2.8 TB. Tests were run for a minimum of one hour, with a five-minute ramp-up before each test and a minimum one-hour cool-down between tests.
Comparative HCI solutions were also 2U, four-node systems with similar configurations, although all used two cache SSDs while HyperFlex used only one. Vendor A used two 400GB SSDs and four 1TB SATA HDDs; Vendor B used two 400GB SSDs and 12 1.2TB SAS HDDs; and Vendor C used four 480GB SSDs and 12 900GB SAS HDDs.
Testing was performed using various read/write profiles and block sizes, with 100% random data. VMs by nature generate random I/O by combining I/O from multiple applications and workloads. ESG Lab focused on results obtained using workloads designed to simulate real-world applications such as a 4KB and 8KB OLTP and SQL Server.
First, ESG Lab looked at overall cluster scalability. The test began with a synthetic workload designed to emulate a typical OLTP I/O mix, 70% read, 100% random with a per-VM target of 800 IOPS. The test was run across 140 VMs in each cluster for three to four hours with a goal of remaining at or below 5ms write latency. As shown in Figure 3, HyperFlex was the only platform to complete this test with 140 VMs and stay below 5ms (4.95ms). For each of the other clusters, the test was rerun against decreasing numbers of virtual machines until write latency of 5ms was achieved. Vendor A successfully supported 70 VMs at 4.65ms average response time, Vendor B passed running 36 VMs with 5.37ms average response times, and Vendor C supported 48 VMs at sub 5.02ms response times.
Next, ESG Lab examined the same synthetic workload against 140 virtual machines to measure the latency of each cluster against IOPS. As seen in Figure 4, the Cisco HyperFlex cluster more than doubled the IOPS of vendor A and supported nearly eight times the IOPS of Vendor B and Vendor C with an average response time of 2.46ms. In comparison, Vendor A’s average response time was 6.61 ms, Vendor B’s was 21.88ms, and Vendor C’s was the highest, at 44.45ms.
Next, ESG Lab looked at a synthetic workload designed to simulate SQL Server I/O patterns.4 Vdbench was used to create a synthetic workload that exercised different transfer sizes and read/write ratios. In the Vdbench profile, the deduplication ratio was set to two with a unit size of 4 KB and the compressibility ratio also set to two. Again, the test was run with 140 virtual machines.
As Figure 5 shows, the Cisco HyperFlex cluster nearly doubled the IOPS of both Vendor A and Vendor B and was more than five times the IOPS of Vendor C. Cisco HyperFlex posted an average response time of 8.2ms. By way of comparison, Vendor A’s average response time was 30.6ms, Vendor B’s was 12.8ms, and Vendor C’s was 10.33ms.
ESG Lab also looked at performance of all-flash configurations of Cisco HyperFlex and Vendor B, a software-based HCI offering running Cisco C240 M4 Rack Servers. All-flash testing used a four-node, Cisco HyperFlex 220C cluster with one 400GB SSD and six 960GB SSDs. The comparative four-node cluster used twice the cache—two 400GB SSDs—and the same number (six) of 960GB SSDs. It’s important to note that Vendor B’s system was configured with the same CPU and memory configuration as in the Cisco HyperFlex 220C cluster.
Testing again used 140 VMs per cluster (35 per node). Each VM, running RHEL 7.2, leveraged four vCPUs, 4 GB RAM, a 16GB local disk, and one 40GB raw disk. The working set was 5.6 TB, and I/O was 100% random; tests were run with a five-minute warmup, a one-hour test run, and a one-hour cluster cool-down between tests. While deduplication and compression are always enabled on the Cisco HyperFlex cluster, tests were run against Vendor B with deduplication and compression set to 50%, and again with both disabled. As shown in Figure 6, the Cisco HyperFlex cluster supported more IOPS at lower latency than Vendor B with or without deduplication enabled.
Next, ESG Lab looked at a synthetic workload designed to simulate SQL Server I/O against all-flash configurations of Cisco HyperFlex and Vendor B. Vdbench was used to create a workload that exercised different transfer sizes and read/write ratios. In the Vdbench profile, the deduplication ratio was set to two with a unit size of 4 KB and the compressibility ratio also set to two.
As Figure 7 shows, the Cisco HyperFlex cluster more than tripled the IOPS of Vendor B with an average response time of 5.3ms. Vendor B’s average response time was 30.58ms, due to an extremely high write response time of 99.84ms throughout the test. This test was run several times on multiple days with consistent results.
An interesting observation was made during all-flash testing. Vendor B showed considerable variability in performance from VM to VM. This test was run using HCIBench against 140 VMs in each cluster. While Cisco HyperFlex showed little variation across all 140 VMs—IOPS stayed very close to 600—Vendor B’s IOPS varied wildly, from a low of 64 to a high of 1,024 IOPS.
It’s important to note that this variability was observed in every iteration of testing, and that no form of storage QoS was used during these test runs on either of the clusters. Network QoS was used for both systems. Inconsistency like this could be quite problematic for administrators, who would likely need to use some form of QoS (if available from the HCI vendor) to attempt to control the VMs that are consuming more than their share, so others are not starved.
Why This Matters
A common complaint about HCI systems has been performance. HCI customers have been more focused on cost-efficiency and simpler management, often relegating HCI to tier-2 workloads. IT departments are unlikely to saddle their tier-1 production applications with high latency and inconsistent, “noisy neighbor” VM performance that some HCI solutions offer.
ESG Lab validated that Cisco HyperFlex hybrid and all-flash systems delivered higher, more consistent performance than other similarly configured HCI solutions using simulated OLTP and SQL workloads. For hybrid clusters, HyperFlex not only consistently outpaced competitors in terms of IOPS and latency, but it also supported more than twice the number of VMs than both software-based and engineered proprietary systems while maintaining high performance.
The HyperFlex all-flash cluster, with always-on deduplication and compression, delivered higher IOPS and lower latency than a competitor with and without data reduction turned on. Equally important, HyperFlex all-flash performance was consistent across all VMs in the cluster, eliminating the need for storage QoS to ensure user satisfaction. In contrast, individual VMs in the competitive cluster received widely varying IOPS, indicating significantly better performance for some VMs than others.
The Bigger Truth
Hyperconverged infrastructures, while becoming mainstream, have long been considered more appropriate for tier-2 workloads. When asked in 2016 why they would choose converged infrastructure over hyperconverged, ESG research survey respondents’ most-often-cited (54%) response was better performance. In addition, 32% of respondents believed converged, i.e., loosely integrated independent components packaged together, was better for mission-critical workloads.5
Fast forward to 2018, and the picture has shifted, with only 24% of respondents citing performance as a reason to choose converged, while just 22% believe converged is better suited to tier-1 workloads.6
Cisco—clearly an “established player”—has an answer to those deficiencies. HyperFlex provides the typical benefits of HCI—it is cost-effective, simple to manage, and lets organizations start small and scale. But it also provides the performance that mission-critical, virtualized workloads demand. The consistency of performance over time and across all VMs in a cluster was particularly notable. In addition, its independent resource scalability enables organizations to adapt quickly to changing requirements, as today’s environments demand.
Cisco HyperFlex HCI solutions are highly integrated, fully engineered systems powered by the latest generation of Intel Xeon processors and provide pre-integrated clusters that include the network fabric, data optimization, unified servers, and choice of hypervisor including VMware ESXi/vSphere and Microsoft Hyper-V, enabling fast deployment. This makes them simple to manage and scale. ESG Lab validated that HyperFlex provides consistent high performance for VMware environments, across hybrid and all-flash clusters. HyperFlex outpaced multiple anonymous competitive solutions with higher IOPS, lower latency, and better consistency over time and across VMs.
The test results presented in this report are based on applications and benchmarks deployed in a controlled environment with industry-standard testing tools. Due to the many variables in each production data center environment, capacity planning and testing in your own environment are recommended. While the methodology in these tests was more stringent than most, customers are well advised to always explore the details behind any vendor testing to understand the relevance to your environment.
When market evolution changes the buying criteria in an industry, there is often a mismatch between what customers want and what they can get. Vendors that can see what’s missing and fill the void gain an advantage. Cisco delivers an HCI solution that provides the essential simplicity and cost-efficiency features of HCI, but also the consistent high performance that has been missing—and that customers need for mission-critical workloads. HyperFlex supports VMware and Microsoft on-premises virtualized environments, and expansion to bare metal, containerized, and multi-cloud environments.
HCI solutions have been focused on second tier workloads, but with the consistent high performance offered by Cisco HyperFlex, there is no reason HCI cannot support tier-1 production workloads. Cisco HyperFlex could be the right solution at the right time for organizations seeking cost-effective, scalable, high performance infrastructure solutions.
1. Source: ESG Master Survey Results, Converged and Hyperconverged Infrastructure Trends, October 2017.↩
2. Source: ibid.↩
3. When evaluating technology solutions, customers would be wise to understand the details behind vendor testing. Timing of test runs, volumes of data, and other details will impact performance results; these results may or may not be relevant to the customer environment.↩
4. A publicly available Vdbench profile was used to simulate the I/O and data patterns produced by SQL Server and these results should not be interpreted as SQL application measurements.↩
5. Source: ESG Research Report, The Cloud Computing Spectrum, from Private to Hybrid, March 2016.↩
6. Source: ESG Master Survey Results, Converged and Hyperconverged Infrastructure Trends, October 2017.↩