IBM Storwize V7000: Real-world Mixed Workload Performance in VMware Environments

Networked storage is being deployed in conjunction with server virtualization by a growing number of organizations interested in consolidation, reduced costs, and improved flexibility and availability of mission-critical applications including databases and e-mail. ESG research indicates that IT managers looking to reap the benefits of server and storage consolidation are concerned about performance. This ESG Lab report presents the results of a mixed workload performance benchmark test designed to assess the real-world performance capabilities of an IBM Storwize V7000 storage system and IBM x3850 X5 servers in a VMware-enabled virtual server environment.

Author(s): Brian Garrett

Published: February 7, 2011

The Challenges

Server virtualization has made giant strides over the past decade, creating heroes inside IT organizations.  Accordingly, it's no surprise that interest in the technology remains as strong as ever. Indeed, respondents to a recent ESG survey ranked "increased use of server virtualization" as their number one IT priority over the next 12-18 months.[1] However, despite the broad success of server virtualization, nagging issues and challenges exist. As a result, a low percentage of the potential workloads that can be virtualized have been migrated to virtual machines, and the consolidation ratios of virtual machines per physical server remains relatively low.

A recent ESG survey of North American enterprise and larger midmarket IT professionals explored the storage challenges associated with the next wave of server virtualization.[2] Given the rapid growth in the number of virtual machines being deployed, it's no surprise that scalability, performance, and the overall volume of storage capacity have been identified as key challenges.

Figure 1. Server Virtualization Storage Challenges

The Solution

This ESG Lab report examines the performance of real-world application workloads running in a virtualized and consolidated IT environment that leverages the following technologies:

  • IBM Storwize V7000 storage systems: The IBM Storwize V7000 is a modular storage solution with built-in storage efficiency, exceptional ease of use, and predictably scalable performance that's ideally suited for  growing virtual server environments.
  • IBM x3850 X5 servers: The IBM x3850 X5 is a rack-mounted server with extraordinary scalability, performance, and reliability.
  • VMware vSphere: VMware vSphere transforms IT infrastructures into a private cloud which enables the automated delivery of IT infrastructure as a service.

The capabilities of the IBM servers and storage used during this evaluation are summarized in Figure 2. Rack-mounted IBM x3850 X5 servers support up to 24 Intel Xeon X7542 processor cores and up to 1.5 TB of RAM with an optional 1U Max5 expansion unit. The IBM Storwize V7000 storage system supports up to 240 drives (SAS, SSD, Nearline SAS) and is equipped with up to 16 GB of cache. The Storwize V7000 leverages mature IBM-developed SAN Volume Controller (SVC) technology to provide a variety of advanced storage functions including external virtualization, thin provisioning, flash copy, and remote mirroring. The graphically rich and intuitive Storwize V7000 management interface was derived from the IBM XIV product line. Easy Tier uses sub-LUN data tiering technology to automatically move frequently used data to high performance SSD drives and infrequently used data to lower cost hard disk drives (HDD) .

Figure 2. IBM Server and Storage Highlights

The Results

This report documents the performance capabilities of IBM Storwize V7000 storage systems running a mix of real-world applications in a VMware vSphere-enabled virtual server environment powered by a pair of IBM x3850 X5 servers. In particular, this report explores how:

  • A single Storwize V7000 attached to a pair of x3850 X5 servers running a mix of real-world application workloads in 16 virtual machines has the IO processing power and bandwidth to simultaneously support up to:

54,208 mailboxes using the Microsoft Exchange 2010 Jetstress  utility

and 5,015 database IOs per second for small OLTP IOs with the Oracle Orion utility

and 849 MB/sec of throughput for large OLAP Oracle Orion operations

and 5,015 simulated web server IOPs

and 644 MB/sec of throughput for simulated backup jobs

o  with the predictably fast response times and scalability

 

  • Easy Tier with 24 flash drives more than tripled the mixed IO capacity of the V7000 (3.21 times more) as it noticeably improved application-level performance:

33% faster Exchange database response times

43% faster Oracle OLTP IO response times

  • In a virtual server environment, the Storwize V7000 achieved a maximum aggregate throughput of 2.98 GB/sec during bandwidth-intensive throughput testing and 1.9 GB/sec during mixed application workload testing.

The predictably fast, mixed workload performance scalability of the virtualized environment tested by ESG Lab is summarized in Figure 3. The results will be explored in detail later in this report, but for now it should be noted that the performance of the Storwize V7000 scaled well as a mix of real-world application workloads ran in parallel on up to 16 virtual machines.

Figure 3. Storwize V7000 Mixed Workload Scalability

The balance of this report explores how mixed workload testing was accomplished, what the results mean, and why they matter to your business.

ESG Lab Validation

The real-world performance capabilities of the IBM Storwize V7000 were assessed by ESG Lab at an IBM facility located in Tucson, Arizona. The methodology presented in this report was designed to assess the performance capabilities of a single IBM Storwize V7000 storage system shared by multiple virtual servers running a mix of real-world application workloads.

VMmark

Conventional server benchmarks were designed to measure the performance of a single application running on a single operating system inside a single physical computer. SPEC CPU2000 and CPU2006 are well known examples of this type of server benchmarking tool. Much like traditional server benchmarks, conventional storage system benchmarks were designed to measure the performance of a single storage system running a single application workload.  The SPC-1 benchmark, developed and managed by the Storage Performance Council with IBM playing a key role, is a great example. SPC-1 was designed to assess the performance capabilities of a single storage system as it services an online interactive database application.

Traditional benchmarks running a single application workload can't help IT managers understand what happens when a mix of applications are deployed together in a virtual server environment. To overcome these limitations, VMware created a mixed workload benchmark called VMmark.  VMmark uses a tile-based scheme for measuring application performance and provides a consistent methodology that captures both the overall scalability and individual application performance of a virtual server solution. As shown in Figure 4, compared to a traditional benchmark, which tests a single application running on a single physical server, VMmark measures performance as a mix of application workloads are run in parallel within virtual machines deployed on the same physical server.

Figure 4. Traditional Benchmarking vs. VMmark Tile-based Benchmarking

The novel VMmark tile concept is simple, yet elegant. A tile is defined as a mix of industry standard benchmarks that emulate common business applications (e.g., e-mail, database, web server). The number of tiles running on a single machine is increased until the server runs out of performance. A score is derived so that IT managers can compare servers with a focus on their performance capabilities when running virtualized applications.

The IBM x3850 X5 server used during this ESG Lab Validation has an excellent published VMmark score of 71.85@49 tiles.[3] At a high level, this means that the IBM x3850 did 71.85 times more work than the dual processor, single core server that VMware used as a reference when VMmark was first released in 2007. At a lower level, the results indicate that a score of 71.85 was achieved while running 49 tiles. Since each tile is six virtual machines, this means that a score of 71.85 was achieved while running 294 virtual machines (6 X 49).  In general, higher scores indicate better mixed workload performance, regardless of the tile count.  To put these results into perspective, the x3850 M5 scored 3.5 times higher than the previous generation x3850 M2 server using only 2.6 more processor cores (64 vs. 24 cores).

A Mixed Real-world Storage Benchmark Methodology

While VMmark is well suited for understanding the performance of a mix of application running on a single server, it was not designed to assess what happens when a mix of applications are run on multiple servers  sharing a single storage system. VMmark tends to stress server internals more than it does the storage system. The methodology presented in the balance of this report was designed to stress the storage system more than the servers. Taking a cue from the VMmark methodology, a tile-based concept was used. As shown in Figure 5, each tile is composed of a mixture of four application workloads. Two physical servers, each configured with eight virtual machines, were used to measure performance as the number of active tiles was increased from one to four.

Figure 5. ESG Lab Tile-Based Storage Benchmarking

The difference between the server-focused VMmark benchmarking and storage-focused ESG Lab benchmarking is shown in Figure 6. Note how VMmark testing is performed with a single server, often attached to multiple storage systems. As a matter of fact, the IBM x3850 X5 VMmark results presented earlier in this report were achieved with four IBM System Storage DS4800 arrays. In other words, when vendors publish VMmark results, they make sure there is plenty of storage available so they can record the highest VMmark server score. This provides IT managers with a fair comparison of the performance capabilities of competitive server technologies.

ESG Lab storage-focused benchmarking uses a different approach. Instead of testing with a single server and more than enough storage, multiple servers are attached to a single storage system. Rather than running application-level benchmarks which stress the CPU and memory of the server, lower level industry standard benchmarks are used with a goal of measuring the maximum mixed workload capabilities of a single storage system.

Figure 6. Server-focused VMmark vs. Storage-focused ESG Lab Benchmarking

Mixed Workloads

Industry standard benchmarks were used to emulate the IO activity of four common business application workloads:

  • E-Mail: The Microsoft Jetstress utility was used to generate e-mail traffic. Similar to the Microsoft LoadSimm utility used in the VMmark benchmark, Jetstress simulates the activity of typical Microsoft Exchange users as they send and read e-mails, make appointments, and manage to-do lists. The Jetstress utility is, however, a more lightweight utility than LoadSimm. Using the underlying Jet Engine database, Jetstress was designed to focus on storage performance.
  • Database: The Orion utility from Oracle was used to generate database traffic. Much like Jetstress, Orion is a lightweight tool that is ideally suited for measuring storage performance. Orion was designed to help administrators understand the performance capabilities of a storage system, either to uncover performance issues or to size a new database installation without having to create and run an Oracle database. Orion is typically used to measure two types of database activity: response-time sensitive online transaction processing (OLTP) and bandwidth sensitive online analytic processing (OLAP).
  • Web Server: The industry standard Iometer utility was used to generate web server traffic. The IO definition was composed of random reads of various block sizes. The web server Iometer profile used for this test was originally distributed by Intel, the author of Iometer. Iometer has since become an open source project.[4] Iometer tests were performed on Windows physical drives running over VMware raw mapped devices.
  • Backup: The Iometer utility was used to generate a single stream of large block sequential read traffic.  Operations that tend to generate this type of traffic include backup operations, scan and index operations, long running data base queries, bulk data uploads, and copies. One 256 KB sequential read workload was included in each tile to add a throughput intensive component to the predominantly random IO profile of interactive e-mail, database, and web server applications. As most experienced database and storage administrators have learned, a throughput-intensive burst in IO traffic can drag down performance for interactive applications, causing performance problems for end-users. Adding a few streams of throughput-intensive read traffic was used to determine whether interactive performance would remain predictably responsive as the amount of mixed IO utilization increased.

Each of the four workloads ran in parallel, with the Jetstress e-mail test taking the longest to complete (approximately three hours). The settings for each of the benchmarks are documented in the appendix.

Test Bed

VMware vSphere version 4.1 was installed on a pair of powerful IBM x3850 X5 servers, each with four six-core processors and a pair of dual-port host adapters. A Storwize V7000 with 216 300 GB 10K RPM SAS was connected to the servers through a pair of 8 Gbps FC switches as shown in Figure 7.

Figure 7. ESG Lab Test Bed

Virtual Machine Configuration

Storwize V7000 disk capacity was used for all storage capacity including VMware virtual disk files (VMDK), Windows Server 2008 R2 operating system images, application executables, and application data. Disks were configured to use VMware paravirtual SCSI (PVSCSI) adapters.[5] The operating system images were installed on VMDK volumes. All of the application data volumes under test were configured as mapped raw LUNs (also known as raw device mapped, or RDM, volumes).

Storage Configuration

Each of the Exchange instances was configured with an eight-drive RAID-10 database volume and a five-drive RAID-5 log volume. The web server and backup reader workloads ran against eight-drive RAID-10 volumes. The operating system volumes were configured using a 4+1 RAID-5 layout. The Oracle volumes were configured with a combination of RAID-10 and RAID-5 volumes (four RAID-10 4+4 and four RAID-5 4+1). Application-level   storage pools were used and volumes striped over multiple mdisks.  For Example, the Exchange database volumes were configured using a storage pool that was composed of four RAID-10 mdisks, each with 8 drives.  Each of the Exchange VM's had a volume that was striped across all of the disks in that storage pool.

Volume ownership was balanced across the dual controllers within the Storwize V7000 and distributed evenly over the eight host interfaces. The volumes were spread evenly over two VMware host groups with a round robin multipath policy.[6] The drive configuration is summarized in Table 1.

Table 1. Drive Configuration

Why This Matters

ESG research indicates that storage scalability and performance are significant challenges for the growing number of organizations embracing server virtualization technology. Storage benchmarks have historically focused on one type of workload (e.g., database or e-mail) and one key performance metric (e.g., response time or throughput). Server benchmarks have typically tested only one server running a CPU-intensive workload that doesn't stress storage. To help IBM customers understand how a Storwize V7000 performs in a virtual server environment, this benchmark was designed to assess how real-world applications behave when running on multiple virtualized servers sharing a single storage system.

The Results

In a way, storage system benchmark testing is like an analysis of the performance of a car. Specifications including horsepower and acceleration from 0 to 60 are a good first pass indicator of a car's performance. But while specifications provide a good starting point, there are a variety of other factors that should be taken into consideration including the condition of the road, the skill of the driver, and gas mileage ratings. Much like buying a car, a test drive with real-world application traffic is the best way to determine how a storage system will perform.

Characterization

Performance analysis began with an examination of the low level aggregate throughput capabilities of the test bed.  This testing was performed using the Iometer utility running within the eight virtual machines that were used later during mixed workload testing.  The eight virtual machines accessed Storwize V7000 storage through eight 8 Gbps FC interfaces.

Iometer access definitions, which measured the maximum throughput from disk, were used for this first pass analysis of the underlying capabilities of the Storwize V7000.[7] Similar to a dynamometer horsepower rating for a car, maximum throughput was used to quantify the power of a turbo-charged Storwize V7000 storage engine. As shown in Figure 8, ESG Lab recorded a maximum throughput of 2.98 GB/sec.

Figure 8. Characterizing the V7000 Engine

What the Numbers Mean

  • Much like the horsepower rating of a car, the throughput rating of a storage system is a good indicator of the power of a storage system's engine.
  • Storage throughput is a measure of the bandwidth available to the system. Throughput can be measured on a stream or aggregate basis. A stream is represented by one application or user communicating through one IO interface to one device. Aggregate throughput is a measure of how much data the storage system can move on a whole for all applications and users.
  • ESG Lab throughput characterization was performed using the industry standard Iometer utility as 48 streams performed large sequential reads from eight logical devices through eight FC interfaces.[8]
  • ESG Lab recorded a peak aggregate throughput of 2.98 GB/sec in a VMware vSphere environment.
  • When comparing the performance capabilities of two servers in a virtual server environment, the server with more cache tends to perform better. ESG Lab is confident that a similar pattern holds true for storage systems. A storage system with more cache-and better caching algorithms-should perform better in a virtual server environment.
  • ESG Lab characterization testing indicates that the Storwize V7000 has more than enough cache and front-end bandwidth to meet the needs of virtualized applications.
  • ESG Lab is convinced that the caching algorithms of the Storwize V7000 provide a significant performance boost during virtualized mixed application testing.

Why This Matters

A storage system needs a strong engine and well-designed modular architecture to perform predictably in a mixed real-world environment. One measure of the strength of a storage controller engine is its maximum aggregate throughput. ESG Lab testing of the Storwize V7000 in a VMware vSphere environment achieved 2.98 GB/sec of aggregate large block sequential read throughput.

In ESG Lab's experience, these are excellent results for a dual controller modular storage system. As a matter of fact, these results provide an early indication that the Storwize V7000 is well suited for virtual server consolidation and mixed real-world business applications.

Virtual Machine Utilization

Mixed application testing began with a quick analysis of server memory and CPU utilization to make sure that there were no bottlenecks between virtualized applications and the Storwize V7000. Memory and CPU utilization as reported by the VMware Infrastructure Manager are shown in Figure 9.

Figure 9. Low Memory and CPU Utilization

These screenshots were taken during the peak activity phase of the four-tile test. With memory and CPU utilization at less than 5%, there was no obvious bottleneck between virtualized applications and the Storwize V7000.

Mixed Real-world IOPS Scalability

IOs per second, or IOPS, is a measure of the number of operations a storage system can perform in parallel. When a system is able to move a lot of IOPS-from disk and from cache- it will tend to be able to service more applications and users in parallel. Much like the horsepower rating for a car engine, the IOPS rating for a storage controller can be used as an indicator of the power of a storage system engine.

While IOPS out of a cache is typically a big number and can provide an indication of the speed of the front end of a storage controller, IOPS from disk is a more useful metric when determining the real-world performance of a storage system servicing a mix of business applications. For example, e-mail and interactive database applications tend to be random in nature and therefore benefit from good IOPS from disk. With that said, a mix of real-world applications tends to have random and sequential IO traffic patterns that may be serviced from disk or from cache.

ESG Lab measured IOPS performance as reported by the Storwize V7000 as the number of virtual machines running mixed real-world application workloads increased from four through sixteen. With a mix of random and sequential IO over 216 disk drives, the goal was not to record a big IOPS number; the goal with this exercise was an assessment of the scalability of the Storwize V7000 as an increasing number of applications are consolidated onto a single virtualized platform. The IOPS scalability during the peak period of mixed workload activity is shown in Figure 10.

Figure 10. Storwize V7000 Mixed Workload Scalability

What the Numbers Mean

  • IOPS varied throughput the mixed workload test with peaks occurring during the Orion small IOPs phase and toward the end as the Jetstress utility as it performed a database consistency check.
  • A peak of 20,343 IOPS was recorded during the four tile run.
  • IOPS scaled well as mixed real-world application traffic increased from four through sixteen virtual servers.

Why This Matters

Predictable performance scalability is a critical concern when a mix of applications shares a storage system. A burst of IO activity in one application (e.g., a database consistency check) can lead to poor response times, lost productivity, and, in the worst case, lost revenue.

ESG Lab confirmed that the rate of IOs processed by the Storwize V7000 scales as more applications are deployed in a growing virtual server environment.

Handling Throughput Spikes with Ease

As noticed during IOPS monitoring, peaks of throughput activity could be correlated to the periodic behavior of real-world applications. Two bursts of aggregate throughput were observed: the first during the Oracle large MBPS test which simulates a throughput-intensive OLAP application and the second during the Jetstress database consistency check. A VMware vSphere view of mixed workload performance on one of the servers is shown in Figure 11.

Figure 11. Peak Throughput (One Server, Four Active Tiles, Stacked VM View)

What the Numbers Mean

  • An aggregate throughput level of 1.9 GB/sec was recorded as mixed, real-world applications were run on 16 virtual machines sharing a single Storwize V7000 storage system (950 MB/sec for one of the two physical servers is shown in Figure 11).
  • As throughput intensified during the Oracle Orion OLAP test phase, bandwidth utilization for other mixed workloads operating in parallel remained steady.

Why This Matters

Storage benchmarks typically focus on response time sensitive interactive workloads or throughput-intensive sequential workloads, yet mixed real-world applications in virtualized environments are usually a mix of both.  A burst of activity due to a search and index operation, a database query, a backup job, or a video stream can be extremely throughput-intensive. Deploying more storage systems or more hardware within each storage system is one way to avoid the potential performance impact of a throughput-intensive workload in a mixed environment, but this increases cost and complexity and defeats the goal of shared storage consolidation. ESG Lab observed a peak aggregate throughput of 1.9 GB/sec as an Oracle Orion OLAP job was running-while other applications ran in parallel with predictably good response times.

Mixed Application Performance Scalability

Having looked at the IOPS and throughput ratings of the turbo-charged Storwize V7000 engine, here's where the rubber meets the road as we examine performance at the application level. The output from each of the industry standard benchmark utilities was analyzed to determine the performance scalability and responsiveness of real-world applications running in a consolidated virtual environment.

Microsoft Exchange

The IO and performance efficiency of Microsoft Exchange have improved significantly over the years. Architectural improvements in Exchange 2010 including a new store schema, larger page sizes (8 KB to 32 KB), improved read/write coalescing, improved pre-read support, and increased cache effectiveness have reduced the number of IOs per user up to 70% compared to Exchange 2007.[9] ESG Lab typically uses a value of 0.5 IOPS per mailbox to emulate a typical worker when testing with Jetstress 2007. A value of 0.12 IOPS per mailbox was used during Jetstress 2010 testing to reflect the 70% reduction in IOPS compared to Exchange 2007.

The Microsoft Jetstress 2010 utility was used to see how many simulated e-mail users could be supported by the Storwize V7000 during mixed workload testing. The number of IOPS and response time for each database and log volume was recorded at the end of each Jetstress run. A response time goal of 20 milliseconds or less for database reads is required to pass the test. These values are defined by Microsoft as a limit beyond which end-users will feel that their e-mail system is acting slowly.[10] The results are shown in Figure 12 and itemized in Table 2.

Figure 12. Mixed E-mail Scalability (Response Time)

Table 2. Jetstress 2010 Performance Results (One Through Four Tiles)

What the Numbers Mean

  • The single tile mixed application test supported 29,417 Exchange users with an average DB disk response time of 5.2 milliseconds.
  • Performance scaled to 54,208 users while the Storwize V7000 was busy processing and servicing other applications concurrently.
  • As the number of simulated e-mail users was increased, the Storwize V7000 provided excellent response times that are well within Microsoft's guidelines. Note that response times for database reads are below the Microsoft recommended maximum of 20 milliseconds which is shown as a dotted line in Figure 12.
  • The IO efficiency improvements in Exchange 2010 reduce the cost of delivering e-mail support in mixed virtual server environments. In this case, ESG Lab supported up to 54,208 mailboxes on four virtualized Exchange 2010 servers in a mixed workload environment-more than twice the expected number of supported mailboxes within an Exchange 2007 environment.

Comments

Tweets that mention IBM Storwize V7000: Real-world February 7, 2011, 6:31 PM

[...] This post was mentioned on Twitter by Jérémie BRISON and Matthew Brasher, ESG. ESG said: New #ESGglobal Lab Validation Report: IBM Storwize V7000: Real-world Mixed Workload Performance in VMware Environments http://bit.ly/gPSStb [...]

IBM Storwize V7000: Real-world Mixed Workload Perf February 21, 2011, 2:44 AM

[...] Networked storage is being deployed in conjunction with server virtualization by a growing number of organizations interested in consolidation, reduced costs, and improved flexibility and availability of mission-critical applications including databases and e-mail. ESG research indicates that IT managers looking to reap the benefits of server and storage consolidation are concerned about performance. This ESG Lab report presents the results of a mixed workload performance benchmark test designed to assess the real-world performance capabilities of an IBM Storwize V7000 storage system and IBM x3850 X5 servers in a VMware-enabled virtual server environment. Read on here [...]

Post a Comment
  • Leave this field empty
Please Enter Correct Verification Number
NEWSLETTER

Enter your email address, and click subscribe