This ESG Technical Validation documents the results of recent Nutanix performance testing focused on real-world performance scalability and sustainability improvements in support of mission- and business-critical application, database, and end-user computing (EUC) workloads.
For business-critical applications and workloads, traditional infrastructure deployments are complex. Provisioning is often a slow, multi-step process that consumes days or weeks and involves multiple infrastructure teams. Lengthy upgrades require significant downtime, and it is difficult to keep up with patching across application instances. Creating and managing copies for multiple groups—test/dev, QA, and business intelligence—takes time and consumes costly space on storage arrays. Also, restore and recovery operations require hours or days of rolling back snapshots and log files across fragmented resources. Ultimately, the way application infrastructure is deployed can impact productivity, causing delays in time to value for organizations’ business activities. It’s no surprise that three-quarters of respondents (75%) to ESG’s annual technology spending intentions survey reveal that IT is more complex compared with two years ago (see Figure 1).1
Hyperconverged technologies continue to replace legacy technology solutions and organizations’ buying criteria have continued to expand. They’re looking past the original promise of simplicity and cost savings; organizations are also prioritizing requirements such as performance, scalability, and reliability—recognizing that technologies like the cloud and software-defined storage will be far less complex and more cost-effective than a traditional siloed approach. In another ESG research study, nearly half (46%) of respondents reported that they were using hyperconverged infrastructure (HCI) solutions, while 69% of respondents said they expect spending on hyperconverged technology to accelerate.2 This is not surprising, given the factors driving them to consider HCI. Deployment drivers have historically included improved scalability, total cost of ownership, ease of deployment, and simplified systems management. Organizations need a solution that can deliver both simplicity and consistent, mixed workload performance for critical business workloads without the need to tweak and tune the environment.
Nutanix Cloud Platform
Nutanix is designed to deliver a complete, software-driven IT infrastructure stack with the agility, scalability, and simplicity of the cloud combined with the security, performance, and cost predictability of a traditional on-premises infrastructure. The architecture is a scale-out, fully distributed software platform leveraging web-scale engineering principles innovated by leading cloud companies such as Google, Facebook, and Amazon. The software integrates the compute, virtualization, and storage environments into a single solution. This integration eliminates the complexity of traditional SAN and NAS environments, costly, special-purpose hardware, and the specialized skill sets they require. Nutanix HCI platform with new Blockstore and Intel’s SPDK technology—combined with other technologies like Autonomous Extent Store (AES), which was introduced in a prior version of Nutanix HCI—capitalizes on the optimized architecture of Nutanix HCI to accelerate performance. These innovations optimize for high-throughput and low-latency applications, and they are uniquely designed to fully leverage the benefits of new media such as NVMe and storage-class memory.
Object storage has gained wider adoption as one of the best ways to rapidly store and retrieve continuously growing data sets that are often machine-generated and scale to petabytes. Nutanix Objects is an S3-compatible, software-defined, scale-out object storage solution. From an infrastructure perspective, this frees organizations to focus on highly performant tiers of compute and storage independently, decoupling cold or archived data. When data needs to be processed, it can be brought closer to compute resources on demand, with no impact to the user experience. SmartStore uses standard S3 APIs in Nutanix Objects to connect to the remote storage tier.
Nutanix also offers software-defined, hyperconverged infrastructure for databases that provides simplicity, agility, high availability, and efficiency. A key feature that makes Nutanix a simple and effective platform for databases is the software tool called Era, which helps customers with complete database lifecycle management at the click of a button. Era enables you to deploy databases in minutes, configured with disaster recovery. It also provides simple, efficient copy creation; easy patching and upgrades; automatic refresh; and simple rollback to any point in time with Nutanix Era Time Machine.
With built-in best practices, Era delivers a distinct advantage for databases running on Nutanix HCI when compared with the traditional setup and tuning that can take days or weeks of administrative effort. The simplicity enables non-DBAs to provision complex, multi-cluster databases with ease. Era fits in well with the promise of hyperconverged infrastructure, which was designed to simplify infrastructure deployments for applications.
ESG Technical Validation
ESG validated the ease of use during a remote demonstration of Nutanix Era, including simplifying operations from a single pane of glass for provisioning, cloning and refresh of space-efficient copies, patching, and Time Machine recovery. Our first area of focus was relational database management systems (RDBMSs) with Nutanix Era.
RDBMS with Era: Simplicity
The dashboard view provides an overview of all database instances, including details of space savings, sources, clones by age, Time Machine snapshots, and alerts.
We quickly provisioned an Oracle database using four easy to navigate screens and several mouse clicks. We clicked on Database/Provision and selected Oracle as the engine; we had the choice of selecting a single instance or a multi-node cluster. Next, we chose the Nutanix cluster on which to place the database and selected the Oracle version, followed by choosing the compute profile (templated into small, medium, and large in terms of vCPUs, cores, etc.), network profile (vLAN), and public key for access. Next, we gave the database a name, selected the disk group size, and entered the storage system password. There were spaces available to insert pre- and post-commands if desired, such as for adding data masking. Finally, we specified the Time Machine Gold policy, which was configured to save 30 days of continuous transaction logs, plus 30 daily, four weekly, 12 monthly, and four quarterly snapshots. The last step was to click Provision, and the task began.
At the end of every task, Era offers an API Equivalent button, which will bring up the complete API calls configured for various programming languages if preferred. In addition, details could be viewed from the blue icons on every screen.
Our demo also explored how easy patching was. It involved selecting a clone, clicking the Update Available message, choosing the upgrade from a list, and choosing to upgrade now or at a scheduled time. From the Operations screen, we could see the provision and patching task steps being executed, with time stamps.
The Time Machine feature provides snapshot restore by rolling back to any point in time, down to the second, by creating a clone from the snapshot. For a CRM database, we viewed the calendar of snapshots, color coded for continuous, daily, weekly, monthly, and quarterly snapshots.
Restoring was simply a matter of selecting a date on the calendar, choosing either a daily snapshot or the hour/minute/second from which to restore, choosing the location on which to create the clone, and providing a name and database profile (small, medium, or large). Pre- and post-command and the API Equivalent button were also available.
Why This Matters
Databases are critical, business-driving applications for many organizations, for both transactional and analytical use cases. Traditional infrastructure deployments for databases cause complexity in provisioning, updating, cloning, and refreshing, causing delays that inhibit time to value.
ESG validated that Nutanix with Era simplifies database provisioning, cloning, refresh, patching, and restore and recovery from a simple GUI, with options for automation using the CLI or API. The interface is so simple and intuitive that non-DBAs can easily accomplish any task across the entire database lifecycle. Also, Time Machine functionality dramatically simplifies restore and refresh to any point in time.
RDBMS with Era: Performance
ESG audited complete and detailed results from performance tests using a four-node Nutanix NX-8170-G7 cluster—populated with eight Intel DC P4510 Series 4TB NVMe devices per node—that examined both synthetic raw performance and realistic database workloads. The testing used the Nutanix tool to demonstrate raw performance capabilities of the platform and industry-standard database workload generation tools that exercised the Nutanix HCI using live SQL Server and Oracle databases. The workloads we look at in this report include:
- Raw Performance — This test generated random reads and random writes, with a goal of demonstrating peak burst performance.
- I/O Profile — 8KB random reads and writes, 1MB sequential reads and writes.
- SQL Server Performance —
- I/O Profile — Dell’s Benchmark Factory was used to generate an OLTP database workload that emulated users in a typical online brokerage firm as they generated trades, performed account inquiries, and executed market research. The workload was composed of multiple transaction types with a defined ratio of execution—some performed database updates, requiring both read and write operations, while others were read-only. The estimated read/write I/O ratio was 90% reads to 10% writes.
- Oracle I/O Performance —
- The Silly Little Oracle Benchmark (SLOB) was used to efficiently generate realistic system-wide, random, single-block, and application-independent SQL queries. The tool exercised all components of the server and storage subsystems by stressing the physical I/O layer of Oracle through SGA-buffered random I/O, without being limited to a specific load-generating application.
First, we tested the cluster’s raw IOPS performance, a common assessment of basic horsepower of the system, and compared results to testing performed in 2017. The system tested in 2017 was an all-flash Nutanix NX-3460-G5, four-node cluster running Nutanix HCI 5.0 with two Intel Xeon E5-2680v4 processors (14 cores at 2.4 GHz), 256GB RAM, and six 1.92TB SSDs per node. The current Nutanix system under test was a four-node Nutanix NX-8170-G7 cluster running Nutanix HCI 5.15 LTS with two Intel Xeon 8280 processors (28 cores at 2.7GHz), 768GB RAM, and eight 2TB NVMe devices per node.
As shown in Figure 7, Nutanix showed a 5.3x performance improvement in random reads, and a 4.3x improvement in random writes.
SQL Server Performance
Next, we compared SQL Server OLTP performance between the same two systems. The current tests used the latest software stack: Windows 2019, SQL Server 2019 CU6, and Benchmark Factory 8.3. Four agents were used to generate a total of 80 concurrent users per VM (totaling 320 cluster-wide users), so that all users interacted with the database as quickly as possible (no think time). Test runs were completed for each VM count (one to four) to highlight predictable performance scalability as the demanding OLTP workload exercised more resources in the cluster. It should be noted that IOPS and transactions/sec do not have a 1:1 correspondence. In most cases, a single transaction comprises multiple read and write I/O operations. Another important metric difference is latency. Storage latency is often associated with IOPS, while the transaction response time as reported in this analysis is specific to the OLTP workload, which exercises both compute and storage. As shown in Figure 8, ESG analyzed the transactions/sec and average transaction response in seconds.
ESG reviewed data showing consistent performance scaling as the concurrent database instances increased from one to four, while average transaction response times remained low. The total number of transactions per second (TPS) averaged 6,559 per database instance, with the lowest-yielding SQL Server VM producing 5,844 TPS and the highest-yielding SQL Server VM producing 7,010 TPS.
This showed a twofold benefit: not only near-linear OLTP performance scalability with a remarkably small variance of just 6% between all instances as more nodes were added, but also an even workload distribution that predictably consumed resources without impacting the other SQL Server instances. Even more impressive was the average transaction response time. The Nutanix solution consistently delivered ultra-fast speeds of .012 seconds per transaction with all four nodes running the workload.
The average I/O results were gathered during the execution of the test when all four virtual machines were being tested at the same time. The small increase of write latency is most likely due to the much higher transactions per second being processed by the system. Said another way, transactions increased by 146% with an expense of 1.8% of storage write latency. Such a small value could be offset by some tweaks in SQL Server or the Nutanix platform, but with a transaction response time reduction of 61%, this is an outstanding performance result.
Oracle Performance Driven by SLOB
Next, ESG compared results of an insert/update/read workload driven by SLOB running on an Oracle database between our modern cluster of four NX 8170 nodes and an all-flash Nutanix NX-9460-G4 cluster tested in 2017. The NX-9460-G4 cluster contained dual Intel Haswell E5-2680v3 processors (12 cores at 2.5GHz), 256 GB of RAM, and six 1.6TB SSDs. Eight total VMs (running Red Hat Enterprise Linux [RHEL] 7.2 with six vCPUs and 32 GB of RAM) were configured with a single instance Oracle database. Each VM was given a 100GB vDisk for RHEL, a 100GB vDisk dedicated for the Oracle Cluster Registry (OCR), and 16 125GB vDisks for Oracle database data files and online redo logs. The NX 8170 four node cluster ran an updated software stack: Oracle 19.3, Oracle Enterprise Linux 7.7, and SLOB 188.8.131.52.
Performance was recorded using Oracle Automatic Workload Repository (AWR) to provide the performance analysis from Oracle’s point of view, with Oracle’s data.
Additional performance highlights with one DB per Nutanix node include greater than 1,700 MB/s read throughput at 0.76 ms latency and greater than 630 MB/s write throughput at 1.23 ms latency.
Again, Nutanix demonstrated near perfect linear scalability for reads and writes. Reads showed just a 5.6% variance and writes showed 6.3%. Latency stayed quite low throughout the tests, with an increase in performance of 53% overall—again, an outstanding result.
When we increased Oracle VMs per node to 2:1, the system still improved performance significantly. We saw IOPS increase by 14% and bandwidth increase from 2,334 MB/s to 3,360 MB/s while keeping read latency below 1 ms.
Splunk SmartStore on Nutanix with Nutanix Objects
Next, ESG looked at Splunk workloads, comparing the economics of two scenarios. We compared Splunk “classic” on bare metal servers with hot and warm data hosted locally on SSDs in the compute nodes and cold data stored on external storage, with Splunk SmartStore on Nutanix, where hot data is stored on SSDs local to indexer nodes and Nutanix Objects are used for warm and cold data. Both deployments were configured and tested to ensure they conformed to Splunk requirements for latency and performance.3 Fio was executed with a range of block sizes relevant to the Splunk workload—60% 4k, 20% 8k, and 20% 32k. Splunk requires shared storage systems to be able to provide 1,200 IOPS for Indexers and 800 IOPS for the search head. It’s important to note that this test was not designed to demonstrate the maximum performance of a Nutanix node or cluster; the purpose was to validate performance at or above required levels to run Splunk.
Figure 10 shows the results of performance testing with Nutanix. Nutanix showed near linear scalability as the Splunk workloads were run on additional nodes in the cluster. In every case Nutanix was able to exceed Splunk’s requirements for shared storage IOPS, throughput, and latency. In these tests, latency averaged 4.7 ms for sequential I/O and 5.5 ms for random I/O. Nutanix handily demonstrated suitability for running Splunk index and search workloads on the same host.
To compare the economics of Splunk on bare metal with Splunk SmartStore on Nutanix, we modeled and compared three systems sized to ingest 1TB, 3TB, and 10TB per day. Splunk sizing inputs were as follows: cache/hot and warm data retention – 30 days, SmartStore/cold data retention – 3 years, replication factor – 2, and searchability factor – 2.
The bare metal systems were based on a traditional server vendor’s certified and bundled offering for Splunk. Servers had two Xeon Gold 5120 14-core CPUs, 128GB of RAM, and 8x 1.92TB SSDs each. The NAS system housed 42 TB of usable storage per node. The Splunk SmartStore on Nutanix configurations utilized Nutanix to house indexers and search heads in virtual machines and were configured with AOS PRO three-year licenses with production support, three-year hardware support contracts with production support, and a three-year dedicated Objects license with production support.
Splunk SmartStore on Nutanix demonstrated consistent cost savings over Splunk on bare metal servers, and the savings increased with the size of the environment. This is because the cost savings comes mainly from offloading warm and cold storage to low-cost remote object storage. With SmartStore, the cold storage capacity appears reduced compared to non-SmartStore since the remote storage takes over responsibility for maintaining high availability. Unlike the bare metal option, the replication factor has no effect on how the remote storage service achieves that goal, so organizations can purchase less storage to store the same volume of data.
ESG audited performance testing of a Nutanix NX-3060-G7 cluster that simulated a growing Citrix 7 Virtual Apps and Desktops deployment. Tests were designed to show the linear scalability of the Nutanix cluster and storage controller latency during both logon storms and steady-state operation. Testing was conducted using the industry-standard VDI benchmarking tool Login VSI. LoginVSI validates application performance and response times for various predefined VDI workloads with an ultimate goal of showing desktop density potential for a given set of hardware and software components.
Two Nutanix NX-3060-G7 blocks with a total of eight nodes formed the test bed cluster. To characterize VDI performance, the LoginVSI Knowledge Worker workload was used. This workload simulates user behavior, using up to seven simultaneous well-known desktop applications like Microsoft Office, Internet Explorer, and Adobe Acrobat Reader, plus video.
Testing began with Login VSI on one node of a four-node cluster, with eight server VMs, to determine the VSImax score, which is the maximum number of users that can be hosted with response times under the threshold for an acceptable user experience.
VSImax was determined to be 175 users for one node. We then ran tests to determine average response time and latency, as both the number of user sessions doubled—to 350, 700, and 1400—and the number of VMs doubled (16, 32, and 64) at two, four, and eight nodes. Figure 12 shows the VSImax average response time in microseconds4 as the number of nodes increased.
The results clearly show that the number of user sessions can linearly scale as the number of nodes increases, while maintaining predictable performance in terms of average response time. In other words, increasing the number of virtual desktops as the number of Nutanix nodes increases will not result in a degradation of application response times.
ESG then examined the storage controller latency during a logon storm of 1,400 virtual desktops on the eight-node Nutanix cluster. We examined the latency for both the logon period and steady-state operation (Figure 13). During the 48-minute logon period, user latency reached a maximum of approximately four milliseconds. The average storage latency during this period was calculated to be 2.67 milliseconds. Once all 1,400 virtual desktops were logged on, the average steady-state latency was calculated to be 4.6 milliseconds.
Why This Matters
Delivering high levels of performance is a requirement for IT environments that rely heavily on mission- and business-critical applications and databases. This is especially important in dynamic environments where data growth is constant and continuous accessibility is a requirement. The ability to easily meet these performance and scalability requirements is essential for anyone evaluating hyperconverged infrastructures. The challenge is that some organizations feel there is too much overhead between the virtualization and the essential underlying services that must always be running to not only ensure proper functionality of the hyperconverged infrastructure, but also meet strict application performance SLAs.
ESG confirmed that Nutanix HCI with Era significantly improves I/O efficiency and performance compared to previous generations. Nutanix has improved application performance and reduced latency, validated in synthetic and real-world testing. Our tests exercised both storage and compute to highlight the type of performance organizations can expect in their own OLTP database environments. Nutanix showed improvement in every test scenario, improving raw IOPS by nearly 5x overall, improving SQL Server transaction processing by 146%, and reducing response time by 61%. Oracle performance improved by 53% overall, while keeping response times extremely low.
ESG validated that Nutanix Hyperconverged Infrastructure has the required performance and features to support demanding Splunk environments, including the elasticity to provide fast expansion to meet ever-increasing ingest rates with no impact to users or applications. This can deliver faster time to value for search-oriented investigations and application monitoring, supported by a redundant storage fabric designed to increase business uptime. Splunk SmartStore on Nutanix demonstrated consistent cost savings over Splunk on bare metal servers and offloading warm and cold storage to low-cost remote object storage means the savings increased with the size of the environment.
ESG confirmed that Nutanix easily supported the demanding requirements of an end-user computing environment by delivering predictable performance for a 1,400 seat Citrix Virtual Apps and Desktops deployment across a single eight-node cluster. Nutanix easily handled the impact of the I/O bursts commonly associated with virtual desktop logon storms, while providing impressively low latency.
Taken together, all of this can easily translate into support for significantly higher density and performance of enterprise-scale databases and applications.
The Bigger Truth
In a world where organizations are leveraging digital transformation, DevOps, and agile development to drive greater efficiency and productivity, organizations need the simplicity and scalability of the cloud that HCI provides to minimize complexity with predictable costs for business- and mission-critical applications and databases with high performance SLAs. High levels of reliable and scalable enterprise-class performance are no longer optional.
To address these not-always-aligned challenges, Nutanix has:
- Made it simple to deploy and manage. Nutanix provides all the tools and dashboards to manage the environment from a single pane of glass, automatable via APIs.
- Streamlined the I/O stack while leveraging technologies like NVMe, up to 100GbE top of rack (ToR) switching support, and RDMA support to maximize performance.
- Provided for complete, simplified information lifecycle management for databases, advanced analytics, and end-user computing workloads.
ESG validated that Nutanix has addressed these issues with their latest generation Nutanix Cloud Platform HCI. Testing confirmed that Nutanix meets the demanding performance requirements of dynamic, mission-critical applications like databases, data analytics platforms, and end-user computing. The Nutanix HCI platform delivered significant IOPS and latency improvements in all our tests. Synthetic and real-world testing exercised both compute and storage resources to meet the high-transaction and low-latency demands of scalable OLTP database deployments in both Microsoft SQL Server and Oracle OLTP database environments. Nutanix also delivered predictably scalable performance for analytics workloads using Splunk SmartStore with Nutanix Objects and for a Citrix Virtual Apps and Desktops deployment, easily handling the impact of I/O bursts commonly associated with VDI logon storms, while providing impressively low latency. Significant economic advantages were validated using Splunk SmartStore on Nutanix as compared to Splunk on bare metal servers.
The results presented in this document are based on testing in a controlled environment. Due to the many variables in each production data center, it is important to perform planning and testing in your own environment to validate the viability and efficacy of any solution.
As hyperconverged technologies mature, Nutanix continues to expand the boundaries of what is possible by not only adopting and developing cutting-edge technology but also providing software that simplifies life for IT admins and DBAs alike. If your organization needs to modernize your IT infrastructure, ESG recommends a serious look at Nutanix HCI to provide your critical enterprise applications the benefits of today’s most highly performant compute and storage technology with the simplicity of HCI.
1. Source: ESG Research Report, 2021 Technology Spending Intentions Survey, January 2021.↩
2. Source: ESG Research Report, Data Storage Trends in an Increasingly Hybrid Cloud World, March 2020.↩
3. Source: Splunk Enterprise, Capacity Planning Manual.↩
4. VSImax average response time is calculated using the average of five Login VSI response time samples plus 40% of the number of active sessions.↩
ESG Technical Validations
The goal of ESG Technical Validations is to educate IT professionals about information technology solutions for companies of all types and sizes. ESG Technical Validations are not meant to replace the evaluation process that should be conducted before making purchasing decisions, but rather to provide insight into these emerging technologies. Our objectives are to explore some of the more valuable features and functions of IT solutions, show how they can be used to solve real customer problems, and identify any areas needing improvement. The ESG Validation Team’s expert third-party perspective is based on our own hands-on testing as well as on interviews with customers who use these products in production environments.