Networked storage is being deployed in conjunction with server virtualization by a growing number of organizations interested in consolidation, reduced costs, and improved flexibility and availability of mission-critical applications including databases and e-mail. ESG research indicates that IT managers looking to reap the benefits of server and storage consolidation are concerned about performance. This ESG Lab report presents the results of a mixed workload performance benchmark test designed to assess the real-world performance capabilities of an IBM Storwize V7000 storage system and IBM x3850 X5 servers in a VMware-enabled virtual server environment.
Published: February 7, 2011
Server virtualization has made giant strides over the past decade, creating heroes inside IT organizations. Accordingly, it's no surprise that interest in the technology remains as strong as ever. Indeed, respondents to a recent ESG survey ranked "increased use of server virtualization" as their number one IT priority over the next 12-18 months. However, despite the broad success of server virtualization, nagging issues and challenges exist. As a result, a low percentage of the potential workloads that can be virtualized have been migrated to virtual machines, and the consolidation ratios of virtual machines per physical server remains relatively low.
A recent ESG survey of North American enterprise and larger midmarket IT professionals explored the storage challenges associated with the next wave of server virtualization. Given the rapid growth in the number of virtual machines being deployed, it's no surprise that scalability, performance, and the overall volume of storage capacity have been identified as key challenges.
This ESG Lab report examines the performance of real-world application workloads running in a virtualized and consolidated IT environment that leverages the following technologies:
The capabilities of the IBM servers and storage used during this evaluation are summarized in Figure 2. Rack-mounted IBM x3850 X5 servers support up to 24 Intel Xeon X7542 processor cores and up to 1.5 TB of RAM with an optional 1U Max5 expansion unit. The IBM Storwize V7000 storage system supports up to 240 drives (SAS, SSD, Nearline SAS) and is equipped with up to 16 GB of cache. The Storwize V7000 leverages mature IBM-developed SAN Volume Controller (SVC) technology to provide a variety of advanced storage functions including external virtualization, thin provisioning, flash copy, and remote mirroring. The graphically rich and intuitive Storwize V7000 management interface was derived from the IBM XIV product line. Easy Tier uses sub-LUN data tiering technology to automatically move frequently used data to high performance SSD drives and infrequently used data to lower cost hard disk drives (HDD) .
This report documents the performance capabilities of IBM Storwize V7000 storage systems running a mix of real-world applications in a VMware vSphere-enabled virtual server environment powered by a pair of IBM x3850 X5 servers. In particular, this report explores how:
o 54,208 mailboxes using the Microsoft Exchange 2010 Jetstress utility
o and 5,015 database IOs per second for small OLTP IOs with the Oracle Orion utility
o and 849 MB/sec of throughput for large OLAP Oracle Orion operations
o and 5,015 simulated web server IOPs
o and 644 MB/sec of throughput for simulated backup jobs
o with the predictably fast response times and scalability
o 33% faster Exchange database response times
o 43% faster Oracle OLTP IO response times
The predictably fast, mixed workload performance scalability of the virtualized environment tested by ESG Lab is summarized in Figure 3. The results will be explored in detail later in this report, but for now it should be noted that the performance of the Storwize V7000 scaled well as a mix of real-world application workloads ran in parallel on up to 16 virtual machines.
The balance of this report explores how mixed workload testing was accomplished, what the results mean, and why they matter to your business.
The real-world performance capabilities of the IBM Storwize V7000 were assessed by ESG Lab at an IBM facility located in Tucson, Arizona. The methodology presented in this report was designed to assess the performance capabilities of a single IBM Storwize V7000 storage system shared by multiple virtual servers running a mix of real-world application workloads.
Conventional server benchmarks were designed to measure the performance of a single application running on a single operating system inside a single physical computer. SPEC CPU2000 and CPU2006 are well known examples of this type of server benchmarking tool. Much like traditional server benchmarks, conventional storage system benchmarks were designed to measure the performance of a single storage system running a single application workload. The SPC-1 benchmark, developed and managed by the Storage Performance Council with IBM playing a key role, is a great example. SPC-1 was designed to assess the performance capabilities of a single storage system as it services an online interactive database application.
Traditional benchmarks running a single application workload can't help IT managers understand what happens when a mix of applications are deployed together in a virtual server environment. To overcome these limitations, VMware created a mixed workload benchmark called VMmark. VMmark uses a tile-based scheme for measuring application performance and provides a consistent methodology that captures both the overall scalability and individual application performance of a virtual server solution. As shown in Figure 4, compared to a traditional benchmark, which tests a single application running on a single physical server, VMmark measures performance as a mix of application workloads are run in parallel within virtual machines deployed on the same physical server.
The novel VMmark tile concept is simple, yet elegant. A tile is defined as a mix of industry standard benchmarks that emulate common business applications (e.g., e-mail, database, web server). The number of tiles running on a single machine is increased until the server runs out of performance. A score is derived so that IT managers can compare servers with a focus on their performance capabilities when running virtualized applications.
The IBM x3850 X5 server used during this ESG Lab Validation has an excellent published VMmark score of 71.85@49 tiles. At a high level, this means that the IBM x3850 did 71.85 times more work than the dual processor, single core server that VMware used as a reference when VMmark was first released in 2007. At a lower level, the results indicate that a score of 71.85 was achieved while running 49 tiles. Since each tile is six virtual machines, this means that a score of 71.85 was achieved while running 294 virtual machines (6 X 49). In general, higher scores indicate better mixed workload performance, regardless of the tile count. To put these results into perspective, the x3850 M5 scored 3.5 times higher than the previous generation x3850 M2 server using only 2.6 more processor cores (64 vs. 24 cores).
While VMmark is well suited for understanding the performance of a mix of application running on a single server, it was not designed to assess what happens when a mix of applications are run on multiple servers sharing a single storage system. VMmark tends to stress server internals more than it does the storage system. The methodology presented in the balance of this report was designed to stress the storage system more than the servers. Taking a cue from the VMmark methodology, a tile-based concept was used. As shown in Figure 5, each tile is composed of a mixture of four application workloads. Two physical servers, each configured with eight virtual machines, were used to measure performance as the number of active tiles was increased from one to four.
The difference between the server-focused VMmark benchmarking and storage-focused ESG Lab benchmarking is shown in Figure 6. Note how VMmark testing is performed with a single server, often attached to multiple storage systems. As a matter of fact, the IBM x3850 X5 VMmark results presented earlier in this report were achieved with four IBM System Storage DS4800 arrays. In other words, when vendors publish VMmark results, they make sure there is plenty of storage available so they can record the highest VMmark server score. This provides IT managers with a fair comparison of the performance capabilities of competitive server technologies.
ESG Lab storage-focused benchmarking uses a different approach. Instead of testing with a single server and more than enough storage, multiple servers are attached to a single storage system. Rather than running application-level benchmarks which stress the CPU and memory of the server, lower level industry standard benchmarks are used with a goal of measuring the maximum mixed workload capabilities of a single storage system.
Industry standard benchmarks were used to emulate the IO activity of four common business application workloads:
Each of the four workloads ran in parallel, with the Jetstress e-mail test taking the longest to complete (approximately three hours). The settings for each of the benchmarks are documented in the appendix.
VMware vSphere version 4.1 was installed on a pair of powerful IBM x3850 X5 servers, each with four six-core processors and a pair of dual-port host adapters. A Storwize V7000 with 216 300 GB 10K RPM SAS was connected to the servers through a pair of 8 Gbps FC switches as shown in Figure 7.
Storwize V7000 disk capacity was used for all storage capacity including VMware virtual disk files (VMDK), Windows Server 2008 R2 operating system images, application executables, and application data. Disks were configured to use VMware paravirtual SCSI (PVSCSI) adapters. The operating system images were installed on VMDK volumes. All of the application data volumes under test were configured as mapped raw LUNs (also known as raw device mapped, or RDM, volumes).
Each of the Exchange instances was configured with an eight-drive RAID-10 database volume and a five-drive RAID-5 log volume. The web server and backup reader workloads ran against eight-drive RAID-10 volumes. The operating system volumes were configured using a 4+1 RAID-5 layout. The Oracle volumes were configured with a combination of RAID-10 and RAID-5 volumes (four RAID-10 4+4 and four RAID-5 4+1). Application-level storage pools were used and volumes striped over multiple mdisks. For Example, the Exchange database volumes were configured using a storage pool that was composed of four RAID-10 mdisks, each with 8 drives. Each of the Exchange VM's had a volume that was striped across all of the disks in that storage pool.
Volume ownership was balanced across the dual controllers within the Storwize V7000 and distributed evenly over the eight host interfaces. The volumes were spread evenly over two VMware host groups with a round robin multipath policy. The drive configuration is summarized in Table 1.
Why This Matters
ESG research indicates that storage scalability and performance are significant challenges for the growing number of organizations embracing server virtualization technology. Storage benchmarks have historically focused on one type of workload (e.g., database or e-mail) and one key performance metric (e.g., response time or throughput). Server benchmarks have typically tested only one server running a CPU-intensive workload that doesn't stress storage. To help IBM customers understand how a Storwize V7000 performs in a virtual server environment, this benchmark was designed to assess how real-world applications behave when running on multiple virtualized servers sharing a single storage system.
In a way, storage system benchmark testing is like an analysis of the performance of a car. Specifications including horsepower and acceleration from 0 to 60 are a good first pass indicator of a car's performance. But while specifications provide a good starting point, there are a variety of other factors that should be taken into consideration including the condition of the road, the skill of the driver, and gas mileage ratings. Much like buying a car, a test drive with real-world application traffic is the best way to determine how a storage system will perform.
Performance analysis began with an examination of the low level aggregate throughput capabilities of the test bed. This testing was performed using the Iometer utility running within the eight virtual machines that were used later during mixed workload testing. The eight virtual machines accessed Storwize V7000 storage through eight 8 Gbps FC interfaces.
Iometer access definitions, which measured the maximum throughput from disk, were used for this first pass analysis of the underlying capabilities of the Storwize V7000. Similar to a dynamometer horsepower rating for a car, maximum throughput was used to quantify the power of a turbo-charged Storwize V7000 storage engine. As shown in Figure 8, ESG Lab recorded a maximum throughput of 2.98 GB/sec.
Why This Matters
A storage system needs a strong engine and well-designed modular architecture to perform predictably in a mixed real-world environment. One measure of the strength of a storage controller engine is its maximum aggregate throughput. ESG Lab testing of the Storwize V7000 in a VMware vSphere environment achieved 2.98 GB/sec of aggregate large block sequential read throughput.
In ESG Lab's experience, these are excellent results for a dual controller modular storage system. As a matter of fact, these results provide an early indication that the Storwize V7000 is well suited for virtual server consolidation and mixed real-world business applications.
Mixed application testing began with a quick analysis of server memory and CPU utilization to make sure that there were no bottlenecks between virtualized applications and the Storwize V7000. Memory and CPU utilization as reported by the VMware Infrastructure Manager are shown in Figure 9.
These screenshots were taken during the peak activity phase of the four-tile test. With memory and CPU utilization at less than 5%, there was no obvious bottleneck between virtualized applications and the Storwize V7000.
IOs per second, or IOPS, is a measure of the number of operations a storage system can perform in parallel. When a system is able to move a lot of IOPS-from disk and from cache- it will tend to be able to service more applications and users in parallel. Much like the horsepower rating for a car engine, the IOPS rating for a storage controller can be used as an indicator of the power of a storage system engine.
While IOPS out of a cache is typically a big number and can provide an indication of the speed of the front end of a storage controller, IOPS from disk is a more useful metric when determining the real-world performance of a storage system servicing a mix of business applications. For example, e-mail and interactive database applications tend to be random in nature and therefore benefit from good IOPS from disk. With that said, a mix of real-world applications tends to have random and sequential IO traffic patterns that may be serviced from disk or from cache.
ESG Lab measured IOPS performance as reported by the Storwize V7000 as the number of virtual machines running mixed real-world application workloads increased from four through sixteen. With a mix of random and sequential IO over 216 disk drives, the goal was not to record a big IOPS number; the goal with this exercise was an assessment of the scalability of the Storwize V7000 as an increasing number of applications are consolidated onto a single virtualized platform. The IOPS scalability during the peak period of mixed workload activity is shown in Figure 10.
Why This Matters
Predictable performance scalability is a critical concern when a mix of applications shares a storage system. A burst of IO activity in one application (e.g., a database consistency check) can lead to poor response times, lost productivity, and, in the worst case, lost revenue.
ESG Lab confirmed that the rate of IOs processed by the Storwize V7000 scales as more applications are deployed in a growing virtual server environment.
As noticed during IOPS monitoring, peaks of throughput activity could be correlated to the periodic behavior of real-world applications. Two bursts of aggregate throughput were observed: the first during the Oracle large MBPS test which simulates a throughput-intensive OLAP application and the second during the Jetstress database consistency check. A VMware vSphere view of mixed workload performance on one of the servers is shown in Figure 11.
Why This Matters
Storage benchmarks typically focus on response time sensitive interactive workloads or throughput-intensive sequential workloads, yet mixed real-world applications in virtualized environments are usually a mix of both. A burst of activity due to a search and index operation, a database query, a backup job, or a video stream can be extremely throughput-intensive. Deploying more storage systems or more hardware within each storage system is one way to avoid the potential performance impact of a throughput-intensive workload in a mixed environment, but this increases cost and complexity and defeats the goal of shared storage consolidation. ESG Lab observed a peak aggregate throughput of 1.9 GB/sec as an Oracle Orion OLAP job was running-while other applications ran in parallel with predictably good response times.
Having looked at the IOPS and throughput ratings of the turbo-charged Storwize V7000 engine, here's where the rubber meets the road as we examine performance at the application level. The output from each of the industry standard benchmark utilities was analyzed to determine the performance scalability and responsiveness of real-world applications running in a consolidated virtual environment.
The IO and performance efficiency of Microsoft Exchange have improved significantly over the years. Architectural improvements in Exchange 2010 including a new store schema, larger page sizes (8 KB to 32 KB), improved read/write coalescing, improved pre-read support, and increased cache effectiveness have reduced the number of IOs per user up to 70% compared to Exchange 2007. ESG Lab typically uses a value of 0.5 IOPS per mailbox to emulate a typical worker when testing with Jetstress 2007. A value of 0.12 IOPS per mailbox was used during Jetstress 2010 testing to reflect the 70% reduction in IOPS compared to Exchange 2007.
The Microsoft Jetstress 2010 utility was used to see how many simulated e-mail users could be supported by the Storwize V7000 during mixed workload testing. The number of IOPS and response time for each database and log volume was recorded at the end of each Jetstress run. A response time goal of 20 milliseconds or less for database reads is required to pass the test. These values are defined by Microsoft as a limit beyond which end-users will feel that their e-mail system is acting slowly. The results are shown in Figure 12 and itemized in Table 2.