Networked storage is being deployed in conjunction with server virtualization by a growing number of organizations to consolidate, reduce costs, and improve the flexibility and availability of mission-critical applications including databases and e-mail. ESG research indicates that IT managers looking to reap the benefits of server and storage consolidation are concerned about performance. This ESG Lab report presents the results of a new performance benchmark methodology which was designed to assess the real-world performance capabilities of an IBM SAN-attached DS5300 storage system deployed in a highly virtualized, consolidated data center.
Published: October 2, 2008
A worldwide wave of server and storage consolidation is reducing the cost and complexity of delivering IT services to the business. Consolidation is clearly a priority as a growing number of organizations embrace server virtualization technology. In a recent survey, ESG asked IT decision makers to list their top priorities over the next 12-18 months.[1] As shown in Figure 1, increased use of server virtualization, data growth management, and data center consolidation were all top priorities.
However, despite the broad success of server virtualization, nagging issues and challenges exist. As a result, a low percentage of the potential workloads that can be virtualized have been migrated to virtual machines, and the consolidation ratios of virtual machines per physical server remains relatively low. A recent ESG survey explored the storage challenges associated with the next wave of server virtualization.[2] Given the rapid growth in the number of virtual machines being deployed, it's no surprise that scalability, performance, and the overall volume of storage capacity have been identified as key challenges.
The DS5300 is designed to meet the demanding performance requirements of real-world enterprise-class storage environments. With high performance that is optimized for mixed workloads, the DS5300 is designed for modular scalability (capacity and/or performance), high availability, and advanced functionality including copy service and remote replication. As shown in Figure 2, the DS5300 is a dual controller system supporting up to 16 8 Gbps Fibre Channel host interfaces, up to 448 or 480 drives (FC, FC FDE, SSD, FC-SAS, FC-SAS, FDE, or SATA), up to 64 GB of cache, and 17 gigabytes per second of internal bandwidth.
This report examines the enterprise-class performance capabilities of the turbo-charged DS5300 including IBM's claim that it is ideally suited to handle the demanding performance requirements of mixed real-world applications deployed in a virtual server environment. In particular, this report demonstrates how a single DS5300 supports:
The real-world performance capabilities of the IBM DS5300 were assessed by ESG Lab via hands-on testing at an IBM facility located in Gaithersburg, Maryland. The methodology presented in this report was designed to assess the performance capabilities of a single IBM DS5300 storage system shared by multiple virtual servers running a mix of real-world application workloads. The cooperation of VMware, IBM, and LSI was key to the success of this project. In particular, this project benefitted from VMware's expertise in helping customers plan for the deployment of business-critical applications in virtual server environments and IBM's long heritage of success in the modular storage systems market in partnership with LSI.
Conventional server benchmarks were designed to measure the performance of a single application running on a single operating system inside a single physical computer-SPEC CPU2000 and CPU2006 are well known examples of this type of server benchmarking tool. Much like traditional server benchmarks, conventional storage system benchmarks were designed to measure the performance of a single storage system running a single application workload. The SPC-1 benchmark, developed and managed by the Storage Performance Council with IBM playing a key role, is a great example. SPC-1 was designed to assess the performance capabilities of a single storage system as it services an online interactive database application.
Traditional benchmarks running a single application workload can't help IT managers understand what happens when a mix of applications are deployed together in a virtual server environment. To overcome these limitations, VMware created a mixed workload benchmark called VMmark. VMmark uses a tile-based scheme for measuring application performance and provides a consistent methodology that captures both the overall scalability and individual application performance of a virtual server solution. The novel VMmark tile concept is simple, yet elegant. A tile is defined as a mix of industry standard benchmarks that emulate common business applications (e.g., e-mail, database, web server). The number of tiles running on a single machine is increased until the server runs out of performance. A score is derived so that IT managers can compare servers with a focus on their performance capabilities when running virtualized applications. As an example, the high-end IBM x3850 servers used during this ESG Lab Validation have an excellent published VMmark score of 13.5 tiles.
While VMmark is well suited for understanding the performance of a mix of applications running on a single server, it was not designed to assess what happens when a mix of applications are run on multiple servers sharing a single storage system. VMmark tends to stress server internals more than it does the storage system. The methodology presented in this report was designed to stress the storage system more than the servers. Taking a cue from the VMmark methodology, a tile-based concept was used during this ESG Lab Validation. As shown in Figure 3, each tile is composed of a mixture of four application workloads. Two physical servers, each configured with eight virtual machines, were used to measure performance as the number of active tiles was increased from one to four.
Industry standard benchmarks were used to emulate the I/O activity of four common business application workloads:
Each of the four workloads ran in parallel, with the JetStress e-mail test taking the longest to complete (approximately three hours). The Iometer workloads were stopped manually after the JetStress utility had finished.
VMware ESX Server 3.5 software was installed on a pair of powerful IBM System x x3850 servers, each with four quad-core 3 GHz processors and 128 GB of RAM. Each server had four dual port 4 Gbps FC host bus adapters connected to a Cisco MDS-9513 FC SAN switch. A DS5300 with 256 15K RPM FC drives was connected to the servers via 16 4 Gbps FC ports as shown in Figure 4.

The DS5300 drive configuration is summarized in Table 1. Two Microsoft Exchange storage groups and two Oracle databases were tested within each tile. Exchange database volumes were configured over eight drive RAID-10 groups. Simulating a pair of database applications with different performance and cost requirements, one of the Oracle databases was configured using RAID-10 and the second was configured with RAID-5. The web server and scan/read volumes were configured using a 7+1 RAID-5 layout. Volume ownership was balanced across the dual controllers within the DS5300 and distributed evenly over the 16 host interfaces. [7]

The configuration of one of the sixteen virtual machines is shown in Figure 5. Each machine was mapped to a quad-core CPU, 16 GB of RAM, a virtual disk over VMFS for the operating system, and one or more mapped raw LUNs. DS5300 disk capacity was used for all storage capacity including VMware virtual disk files (VMDK), Windows 2003 operating system images, application executables, and application data. All of the application data volumes under test were configured as mapped raw LUNs (also known as raw device mapped, or RDM volumes). The configuration of one of the four virtual machines that was used for JetStress e-mail testing is shown in Figure 5. Note how four mapped raw LUNs were configured: two for the Exchange database volumes and two for the Exchange log volumes.

Why This MattersESG research indicates that performance is one of the top five storage infrastructure challenges when it comes to supporting virtual server environments. Storage benchmarks have historically focused on one type of workload (e.g., database or e-mail) and one key performance metric (response time or throughput). Server benchmarks have typically tested only one server running a CPU intensive workload that doesn't stress storage. So that IBM customers can understand how a DS5300 performs in a virtual server environment, this benchmark was designed to assess how real-world applications behave when running on multiple virtualized servers sharing a single storage system. |
In a way, storage system benchmark testing is like an analysis of the performance of a car. Specifications including horsepower and acceleration from zero to sixty are a good first pass indicator of a car's performance. While specifications provide a good starting point, there are a variety of other factors that should be taken into consideration including the condition of the road, the skill of the driver, and gas mileage ratings. Much like buying a car, a test drive with real-world application traffic is the best way to determine how a storage system will perform in real-world conditions.
Performance analysis began with an examination of the low level aggregate throughput capabilities of the test bed. This testing was performed using the Iometer utility running on ten entry-level IBM x335 physical servers running Microsoft Windows operating systems. Half of the drives used later in the mixed, real-world tests were exercised (128 disk drives).
A total of ten servers with twenty 4 GB FC ports were connected through a Cisco MDS 9513 switch to the DS5300 with sixteen active 4 Gbps host ports. A total of 16 LUNs were exercised. Each of the LUNs was configured over a RAID-5 group of 15K RPM drives configured with a 4+1 parity scheme. Each of the Windows servers exercised two LUNs distributed across both DS5300 controllers.
An Iometer profile of 1 MB sequential reads and 1 MB sequential writes was used for this first pass analysis of the raw aggregate throughput capabilities of the DS5300. A similar round of tests performed against the same test bed using an IBM p595 server running the AIX operating system produced similar results. Similar to a dynometer horsepower rating for a car, the maximum throughput reported by the DS5000 console was used to quantify the power of a turbo-charged DS5300 storage engine.

Why This MattersA storage system needs a strong engine and well-designed modular architecture to perform predictably in a mixed real-world environment. One measure of the strength of a storage controller engine is its maximum aggregate throughput. ESG Lab confirmed that a DS5300 system with half the drives used during the mixed workload tests presented later in this report can sustain an excellent 6.2 GB/sec of aggregate large block sequential read throughput. In ESG Lab's experience, this is an extremely impressive result for a dual controller modular storage system. As a matter of fact, this result indicates that the DS5000 should be well suited for virtual server consolidation and mixed real-world business applications, it is definitely well suited for clustered computing, video editing, and scientific applications with extreme bandwidth requirements. |
Having finished the low level throughput testing using ten entry-level physical servers, the DS5300 was reconfigured for mixed real-world testing using a pair of high end IBM x3850 servers as documented previously in this report. Mixed application testing began with a quick analysis of server memory and CPU utilization to make sure that the there were no bottlenecks between virtualized applications and the DS5300. Memory and CPU utilization as reported by the VMware Infrastructure Manager are shown in Figure 7.
These screenshots were taken during the peak activity phase of the four tile test. With memory utilization under 50% and CPU utilization under 25%, there were no obvious bottleneck between virtualized applications and the IBM DS500.
I/Os per second, or IOPS, is a measure of the number of operations that a storage system can perform in parallel. When a system is able to move a lot of IOPS from the disk or out of cache, it will tend to be able to service more applications and users in parallel. Much like the torque rating for a car engine, the IOPS rating for a storage controller can be used as an indicator of the power of a storage system engine.
While IOPS out of a cache is typically a big number and can provide an indication of the speed of the front end of a storage controller, IOPS from disk is a more useful metric when determining the real-world performance of a storage system servicing a mix of business applications. For example, e-mail and interactive database applications tend to be random in nature and therefore benefit from good IOPS from disk. With that said, a mix of real-world applications tends to have random and sequential I/O traffic patterns that may be serviced from disk or from cache.
ESG Lab measured IOPS performance as reported by the DS5300 as the number of virtual machines running mixed, real-world application workloads was increased from four through sixteen. With a mix of random and sequential I/O over hundreds of disk drives, the goal was not to record a big IOPS number. The goal with this exercise was an assessment of the scalability of the DS5300 as an increasing number of applications are consolidated onto a single virtualized platform. The IOPS scalability during the peak period of mixed workload activity is shown in Figure 8.

Why This MattersPredictable performance scalability is a critical concern when a mix of applications shares a storage system. A burst of I/O activity in one application (e.g., a database consistency check) can lead to poor response times, lost productivity, and, in the worst case, lost revenue. ESG Lab confirmed that the rate of I/Os processed by the DS5000 scales extremely well as many applications ran in parallel when running a mix of real-world application workloads. |
As noticed during IOPS monitoring, there were peaks of throughput activity that could be correlated to the periodic behavior of real-world applications. Two bursts of aggregate throughput were observed: the first during the Oracle large MBPS test which simulates a throughput intensive OLAP application and the second during the JetStress database consistency check. The peak recorded shortly after the Orion OLAP phase is shown in Figure 9.

Why This MattersStorage benchmarks typically focus on response time sensitive interactive workloads or throughput intensive sequential workloads, yet mixed real-world applications in virtualized environments are usually a mix of both. A burst of activity due to a search and index operation, a database query, a backup job, or a video stream can be extremely throughput intensive. Deploying more storage systems, or more hardware within each storage system, is one way to avoid the potential performance impact of a throughput intensive workload in a mixed environment, yet this increases cost and complexity and defeats the goal of shared storage consolidation. ESG Lab observed a peak aggregate throughput of 1.6 GB/sec as a throughput intensive Oracle Orion OLAP test was running-even as Exchange e-mail traffic ran with predictably good response times. As throughput intensified during the Oracle Orion OLAP test phase, bandwidth utilization for other mixed workloads operating in parallel remained steady. |
Having looked at the IOPS and throughput ratings of the turbo-charged DS5300 engine, here's where the rubber meets the road as we examine performance at the application level. The output from each of the industry standard benchmark utilities was analyzed to determine the performance scalability and responsiveness of real-world applications running in a consolidated virtual environment.
The Microsoft JetStress testing tool was used to see how many simulated e-mail users could be supported by the DS5300 resources allocated for the Exchange application. The number of IOPS and the response times for each database and log volume were recorded at the end of each JetStress run. A response time goal of 20 milliseconds or less for DB reads was required to pass the test. This value is defined by Microsoft as a limit beyond which end-users will feel that their e-mail system is acting slowly.
ESG used the following IBM guidelines from an IBM report describing the results of a Mailbox JetStress Analysis report to interpret the results:
In an enterprise Exchange 2007 environment, performance is usually designed around a 0.5 IOPS user profile, which is equivalent to a very heavy Exchange user. While disk performance varies, generally you should calculate based on a one hundred IOPS per disk metric, which is a conservative starting point, and tune from there for your specific environment.[8]
Microsoft JetStress logs were used to determine the number of IOPS and response times as the number of active virtual machines was increased from four through sixteen.[9] Based on a 0.5 IOPS user profile, the number of IOPS was used to calculate the number of supported Exchange users. Exchange user scalability as the number of tiles was increased from one to four is shown in Figure 10 and Table 2.
Keep in mind that Exchange 2010 requires 70% less IO than Exchange 2007; users of Exchange 2010 should, therefore, adjust expectations upward accordingly.

This test used two servers and focused solely on storage performance and sizing. The single IBM DS5300 storage array had significant resources remaining and was under-utilized throughout each test run. At 8,752 users per physical server (17,512 users over two physical servers), this result is close to exceeding Microsoft's recommended 10,000 users per server guideline. Microsoft does not recommend more than 10,000 users per server due to the impact of that many users on recovery time service level agreements. In a production environment, it is recommended to stay within Microsoft's recommendations for support and recovery purposes.
The Oracle Orion utility was used to measure small transfer (8 KB) IOPS and response time and large transfer (1 MB) throughput. The small results are used to predict the performance and scalability of response time sensitive interactive database applications (e.g., OLTP). The large results are used to predict the performance of throughput intensive database mining applications (e.g., DSS).
ESG used the following guidelines from presentations presented at Oracle OpenWorld in November 2007 to interpret the results:
Target 5-10 millisecond response time for disks performance response time critical IO. Start by assuming 30 IOPS per disk for OLTP and 20 GB/sec per disk in DSS. This is way below the theoretical value but allow for media repair etc.[10]
For new or non-existing applications, use business rules or data model transaction profiles flow to understand "what is a transaction", and then extrapolate for transactions per second or hour. Optionally you can use the numbers we have seen in our consulting gigs. Note that these are just guideline values. Use the following as basic guidelines for OLTP:
Low transaction system - 1,000 IOPS or 200MBytes/s
Medium transaction system - 5,000 IOPS or 600 Mbytes/s
High-end transaction system - 10,000 IOPS or 1Gbytes/s <- almost rarely achievable and usually TPC-C type workloads[11]
The results for the four tile Orion test are summarized in Table 3. A sample Orion report is shown in the Appendix as Figure 12.

Performance results as reported by Iometer utility for the web server and scan/read workloads executing within virtual machines during the four tile test are listed in Table 4.

Browse by Content Type
Categories
Share