Validation

ESG Technical Review: Huawei OceanStor Dorado V6 All-flash Storage


Abstract

This ESG Technical Review documents hands-on performance and availability testing of Huawei OceanStor Dorado V6 all-flash storage and presents the findings of a five-year TCO analysis highlighting the economic benefits of Huawei OceanStor Dorado V6 when compared with hybrid storage systems from major vendors.

The Challenges

ESG asked flash users what benefits their organization has realized as a result of deploying flash storage. Improved application performance was the most common response (48%) followed by improved total cost of ownership (47%).1 When leveraging the NVMe protocol, flash storage environments can experience reduced latencies and higher overall performance. Participants familiar with NVMe were asked to identify which objectives were the motivation behind their interest in NVMe (see Figure 1). The most common response was a desire to overall “future proof” the environment (56%), narrowly edging out improving the performance of existing applications (55%).

In the same survey, respondents were asked to name their top storage challenges and the top four most identified block storage-related (SAN) challenges for 2019 are hardware costs (30%), data protection (27%), data placement (24%), and rapid data growth (24%). All these issues have been at or near the top of the list since 2015. The overarching issues that drive these data storage concerns are relatively unchanged—data growth is accelerating, and the resulting infrastructure required to store and protect that data is costly and complex.

Organizations are tasked with providing a high-quality, predictable, and productive computing environment for an ever-growing number of internal users and external customers. In addition, enterprise application environments have become increasingly unpredictable as their underlying IT infrastructure grows in complexity and size. Mission-critical business application performance is sensitive to storage performance and latency, and highly dependent on the resilience of the IT environment.

The ability to consolidate critical workloads and functions onto a single all-flash storage system has proven to provide significant TCO benefits if an organization’s performance, reliability, and operational requirements can be met. While many storage vendors offer all-flash solutions, the design decisions and tradeoffs made by these vendors can result in very different system capabilities and ultimately tradeoffs in benefits to an organization.

The Solution: Huawei OceanStor Dorado V6 All-flash Storage

Huawei designed the OceanStor Dorado all-flash storage to handle mission-critical applications, both internal and customer-facing, for large enterprises, as well as mixed workloads. The OceanStor Dorado leverages a multiple controller, end-to-end non-volatile memory express (NVMe) architecture they call SmartMatrix to reduce the latency in accessing NVMe and flash-based storage and ensure high availability. It can scale out since the OceanStor Dorado V6 supports up to 32 controllers.

Huawei offers many features to maximize OceanStor Dorado performance, availability, and efficiency, while minimizing overall total cost of ownership (TCO):

Performance—Huawei OceanStor Dorado all-flash storage system has an NVMe-based architecture, which supports direct communication between the CPU and NVMe SSDs. This eliminates the need for SCSI-SAS conversion and shortens the data transmission path, lowering end-to-end latency. SAS flash drives are supported for workloads that don’t require the extreme performance of NVMe. The system also incorporates a disk controller collaboration algorithm developed by Huawei, which synchronizes the data layout between SSDs and controllers designed to provide performance at a consistently low latency, ensuring that mission-critical applications always operate smoothly.

Availability—The Huawei SmartMatrix fully interconnected architecture tolerates failure of up to seven controllers. Huawei employs multiple layers of software to provide high availability in its platform. Huawei employs RAID-TP, its implementation of triple-parity RAID that allows for up to three simultaneous disk failures.

  • SmartMatrix architecture supports full interconnection between front-end interface cards, controllers, and back-end disk enclosures; combined with the software technology of cache triple copy and continuous mirroring, hardware fault tolerance is extremely high. OceanStor Dorado Tolerates failure of seven out of eight controllers and ensures uninterrupted business continuity.
  • RAID-TP also enables fast rebuild of data, with hot spare space spread across all disk modules.
  • HyperSnap snapshots provide point-in-time redirect-on-write (ROW) snapshots for fast recovery of data with minimal impact on performance. ROW snapshots are optimized for performance, requiring one-third of the I/O operations of copy-on-write snapshots with no computational overhead when reading snapshots.
  • HyperReplication provides a traditional remote replication to a standby data center, and HyperMetro employs a gateway-free active/active deployment of two storage arrays that balances the load between them and permits non-disruptive cross-site takeover in case one array or one link between the array and host fails to ensure continuous data access for the most critical business applications.

Automated Data Infrastructure Management—Huawei OceanStor Dorado all-flash storage system provides a three-layer AI architecture to help organizations automate data management. DeviceManager simplifies configuration, operations, and maintenance at the system and component level. OceanStor DJ virtualizes storage systems into a programmable storage pool, enabling automated, service-based storage to be easily provisioned and managed. The eService Intelligent Operations and Maintenance (O&M) platform helps organizations achieve an intelligent lifecycle, from resource provisioning to fault fingerprinting, to intelligent, predictive analytics and optimization. Huawei has also introduced FlashEver, an upgrade service that allows organizations to independently replace controllers and disk enclosures with no data migration or service disruption. The FlashEver service also provides free access to next-generation hardware.2

Cost of Ownership—Huawei employs its implementation of smart deduplication and compression to provide significant data reduction in the OceanStor Dorado V6. Huawei provides customers a new business model they call “available capacity.” Huawei offers real-world, guaranteed storage capacity3 leveraging built-in data deduplication and compression to reduce the storage space of application data while ensuring performance. This can offer customers enhanced return on investment (ROI) with the subsequent reduction in data center footprints, power, and cooling resources. SmartThin provisioning enables organizations to provision only what they need today, growing capacity on demand, non-disruptively.

ESG Tested

ESG performed hands-on testing and validation of the Huawei OceanStor Dorado V6 all-flash storage system at Huawei’s facilities in Chengdu, China. Testing was designed to validate the performance, reliability, data management, and TCO of the OceanStor Dorado V6 storage platform with a focus on delivering high levels of predictable performance. The ability to sustain these performance levels through various storage hardware failures was also tested. Finally, we looked at the automated data management capabilities of the platform and a five-year TCO analysis was performed.

Performance

The performance test bed utilized by ESG consisted of 12 Huawei FusionServer RH Series Rack servers, each leveraging dual eight-core processors with 256 GB of RAM. The servers were connected to redundant FibreChannel Switches via 16GFC. The switches were then connected via two pairs of 16GFC cables to one Huawei OceanStor Dorado 18000 V6 storage system with one controller enclosure containing a total of four controllers and four disk enclosures, populated with a total of 41 1.92TB NVMe SSD modules.

An Oracle RAC 18c Enterprise Edition cluster was deployed on 12 physical servers with 6.8 TB of capacity allocated across 20 volumes. OLTP testing was performed using Swingbench version 2.5.0.919. SLOB version 2.4.2 was used with one instance of Oracle 18c Enterprise Edition to run an OLTP workload, testing for latency. ESG leveraged the Huawei OceanStor DeviceManager interface (shown in Figure 3) to manage the environment and monitor the tests. Test results were verified using output files and logs from the tested applications. First, ESG started an OLTP workload using SLOB. The workload was configured for 75% reads and 25% updates. After the workload was running for 30 minutes, the system was servicing about 220,000 sustained IOPS at an average response time of just 81 µs.

Next, we tested the Oracle RAC cluster using Swingbench. Once the warmup phase was complete and the workload stabilized, we let the test run for 30 minutes. As seen in Figure 4, the OceanStor Dorado 18000 V6 was able to service 846,091 IOPS at just 477 µs.

Next, we tested the impact of snapshots on performance. With the database workload still running, we created a protection group and added all 20 volumes in the Oracle RAC cluster to it.

Next, we configured HyperCDP to take a snapshot every three seconds and set the retention to the maximum of 60,000 snapshots. It’s important to note that 60,000 is the retention limit for a single LUN; HyperCDP can manage 2 million total snapshots. Snapshots can also be scheduled.

The workload ran for more than 30 minutes and HyperCDP created more than 12,000 snapshots during this time.

As can be seen in Figure 7, performance remained steady throughout the test. The Huawei OceanStor Dorado V6 sustained more than 800,000 IOPS with an average response time of less than 500 µs.

Why This Matters

With the number of tools and technologies that exist in a traditional enterprise environment, the cost and complexity related to maintaining the infrastructure, ensuring constant uptime, and guaranteeing performance levels can easily get out of hand.

ESG validated that a single Huawei OceanStor Dorado V6 NVMe storage system was able to deliver higher performance with lower latency than the previously tested OceanStor Dorado V3. As the simulated database workload ramped up on one server, response time remained extremely low. ESG confirmed that the OceanStor Dorado V6 sustained more than 220,000 IOPS with an average response time of just 81 µs. When we scaled the workload up to a 12 node Oracle RAC cluster, we confirmed nearly 850,000 IOPS with an average response time of 477 µs. While running this workload, we saw no discernable impact of snapshots, even though we were creating them at the rate of three per second per volume. This translates directly to lower upfront and ongoing costs because a given workload can potentially be serviced by a smaller OceanStor Dorado configuration.


Availability

The Huawei OceanStor Dorado V6 all-flash storage platform is designed to ensure high availability and sustain performance through both planned maintenance and unplanned outages.

  • OceanStor Dorado V6 array can house from two to 323 controllers running in active/active mode, and Huawei’s SmartMatrix architecture can tolerate the failure of seven out of eight controllers and three out of four in a four-controller system. SmartMatrix architecture supports full interconnection between front-end interface cards, controllers, and back-end disk enclosures; with the software technology of cache triple copy and continuous mirroring, hardware fault tolerance is extremely high and helps to ensure uninterrupted business continuity.
  • HyperMetro provides a gateway-free active/active high-availability solution between OceanStor Dorado V6 systems, either in the same data center or different data centers that are up to 100km apart. HyperMetro maintains data consistency between the storage arrays in two ways: If one array fails, HyperMetro will switch the failed array’s workload to the redundant array immediately; if a link between a host and an array fails, HyperMetro will direct the host to the array, which continues to provide data access.
  • Huawei’s implementation of RAID-TP can tolerate up to three simultaneous disk failures. With Huawei’s implementation of RAID-TP, parity data and hot spare space is spread across all disks in the storage pool, decreasing reconstruction time and increasing overall storage availability to better ensure continuous data access.

ESG tested using a single host attached to the OceanStor Dorado 18000 V6 via two 16GFC links. We began by running a test workload using Vdbench to generate traffic against four LUNs in the system. The total workload generated by the server was 111,782 IOPS. Each of the four LUNS was servicing approximately 28,000 IOPS, and each controller in the array was also processing approximately 28,000 IOPS.

I/O response time was under 100 µs across all four controllers. Next, we disabled one of the four LUNs.

Figure 8 shows that when the total IOPS for LUN_0001 dropped to zero, the four controllers balanced the load on the remaining three LUNs evenly. Next, we pulled three of the four controllers (Controller B, then A, then C) out of the array to see how the system would react. Figure 9 shows the performance of the array as the controllers are removed. When we started the test, the system was servicing about 200,000 IOPS with an average response time of 18.5 µ sec.

As we pulled each controller, the remaining controllers took up the slack and response time spiked briefly but never exceeded 115 µsec. When controllers A, B, and C were all offline, Controller D was servicing the entire 200,000 IOPS workload at 18 µ sec.

Figure 9 also shows that as each downed controller was brought back online, traffic was evenly balanced across the active controllers, until the original four controllers were again sharing the load.

It’s important to note that the system never stopped servicing I/O and response time averaged under 20 µ sec during this test. This is important both when recovering from an unplanned event, and when executing planned maintenance, like a software update or an in-place controller upgrade. As we pulled each disk, IOPS gradually decreased and response time gradually increased as the array had to reconstitute data from parity on the fly.

The array remained available throughout the triple disk failure without a significant degradation in performance. Total IOPS decreased from a total average of approximately 100,000 to approximately 80,000 while response time increased from the previously observed average of 250 µs to an average of 850 µs.

Finally, ESG looked at RAID-TP rebuild times. With increasing data set sizes, fast rebuild time after a disk failure or replacement is critical. We pulled a disk from an array populated with 36 disks that hosted a 7.1TB volume. Figure 10 shows the actions performed by the array to rebuild the data from the failed disk and the elapsed time.

When the array detects the removed disk, the reconstruction process begins. As Figure 10 shows, the rebuild of the 7.1TB volume took just six minutes to complete, 33% faster than the OceanStor Dorado V3 we tested in 2018.

ESG tested HyperMetro in an on-campus setting, with two OceanStor Dorado V6s named DoradoV6_A and DoradoV6_B configured as a HyperMetro pair in different areas in the same data center. We began by running an OLTP workload to simulate transactional database traffic on an Oracle 12c Real Application Cluster (RAC). Each server was dual attached to the SAN and zoned to have access to both controllers on each array, and the two arrays were connected via dual FC links.

Figure 11 shows the IOPS and response time of DoradoV6_A as the test was executed. The total workload generated by the two servers was approximately 7,500 IOPS. Each array was processing half of the workload, 3,750 IOPS on average. Average I/O response time was 100 µs across all four controllers.

Next, we simulated a failure of DoradoV6_B by pulling the power cord. This shows how the storage responded after HyperMetro moved 100% of the workload onto DoradoV6_A. When DoradoV6_B lost power, there was a brief drop in I/O, and the response time peaked at 4ms, then settled back to 100 µs and I/O was being serviced by DoradoV6_A. We powered DoradoV6_B on and watched as the system automatically recovered. All writes that had accumulated during the outage were synced back to DoradoV6_B completely non-disruptively.

Finally, we tested non-disruptive planned maintenance. Specifically, we upgraded a controller while the system was running, simulating a customer’s experience with Huawei’s FlashEver upgrade service.

With the same workload running that we used in the HyperMetro test, we selected FRU replacement and opened the Replace Controller Wizard. With a couple of clicks, we took Controller B out of service and initiated the replacement. The system ran through a number of checks as it brought the new controller online. The workload continued to run uninterrupted, and response time averaged 200 µs.

Why This Matters

Customers considering solid-state storage deployment are looking not just for performance, but to increase data availability so that users have continuous data access for mission-critical business applications. If the underlying storage is not highly available, the risk of downtime and, subsequently, lost productivity and revenue become real. In today’s business climate, IT professionals must ensure business continuity for data-intensive applications, from traditional CRM to real-time analytics and online transactions, to help their users respond to customer needs quickly.

ESG validated that the SmartMatrix fully interconnected architecture of the OceanStor Dorado V6 is highly available and provides extremely low-latency performance during planned and unplanned outages. We verified that the active/active controllers of the Huawei OceanStor Dorado V6 provide consistent, evenly balanced performance across all controllers in a system across a wide variety of situations, from failed disks to FRU replacement and firmware upgrades, to site outages. ESG was particularly impressed with the ability of the system to sustain high levels of performance with sub-millisecond response times through multiple disk failures, during fast RAID rebuilds, and after the failure of an entire array using HyperMetro.


Automated Data Management with Huawei AI Architecture

Huawei AI architecture is designed to help organizations automate data management from configuration, operations, and maintenance at the system and component level with DeviceManager, to storage system virtualization into a programmable, service-based storage pool with OceanStor DJ, to intelligent lifecycle management, from resource provisioning to fault fingerprinting, to intelligent, predictive analytics and optimization with the Huawei eService O&M platform. ESG tested the eService platform, examining the insights available to administrators. As seen in Figure 13, a health check of our test system showed that while the system was at just over 63% capacity, in less than one month the system would be over 80% capacity, and eService made a suggestion that the system would need an additional 370 TB of capacity over the next 12 months.

This exercise was performed at the device level, but Huawei offers the same optimized operations and maintenance across the data center, and in the cloud.

Total Cost of Ownership (TCO)

ESG modeled and compared the storage-related costs that could be expected when deploying traditional hybrid storage with SAS-based SSD and disk and a Huawei OceanStor Dorado 5000 V6 with NVMe SSDs. The costs associated with purchasing, maintaining, powering, and cooling the storage systems were calculated in US Dollars, and the average cost for electricity in the United States as reported by the US Energy Information Administration4 was used to calculate power and cooling costs. ESG modeled the expected storage total cost of ownership (TCO) for a company that needed to support a highly available mixed-workload production environment. All workloads were assumed to require sub-millisecond response times:

  • A 10TB Oracle RAC OLTP environment able to support 100,000 IOPS and sub-millisecond response times.
  • A 9.6TB, 1,000-seat VDI deployment for heavy users (20 IOPS per user).
  • A 5TB Microsoft Exchange environment to support 5,000 heavy users at one IOPS per user.

The Huawei OceanStor Dorado 5000 V6 was populated with 18 1.92TB NVMe SSDs configured in a single RAID-TP group. We compared that with a hybrid storage system from a major vendor populated with 12 900GB 2.5” SAS SSDs and 652 900GB 2.5” SAS HDDs configured in RAID5 groups. Both systems were modeled with equivalent software, power supplies, racks, and accessories according to each manufacturer’s best practices.

TCO was calculated using a simplified model based on costs that would be incurred over a five-year period without taking into consideration capacity and performance growth requirements or IT operational costs. Maintenance and support contracts, along with typical customer discounts for hardware, software, and maintenance were factored into the estimated costs. Figure 14 shows the TCO cost comparison between hybrid storage and the Huawei OceanStor Dorado 5000 V6.

Over five years, TCO for the hybrid storage system totals $697,568, while the costs for the OceanStor Dorado 5000 V6 is only $150,836, just 21.6% of the hybrid storage total cost. Cost savings were similar across all three measured categories—hardware/software, maintenance/support, and power/cooling.

It’s worth noting that given the availability and AI-driven automation features of the Huawei OceanStor Dorado V6, IT operational costs should be lower as well. Based on our research and experience, SSDs encounter fewer storage failures in the field, which translates into less time and resources spent replacing disks.

Why This Matters

Organizations understand the advantages to be gained by selecting a next-generation purpose-built all-flash array designed to take advantage of the benefits of flash technology over simply adding flash drives to a traditional storage system with traditional limitations. While IT professionals seek high storage performance and availability, they still need to minimize both initial capital outlays and related operational costs over time. Solid-state storage has become a viable option as SSD prices have declined while their reliability has increased. Simultaneously, the costs for operating and maintaining a hybrid SSD+HDD environment tends to get higher over time, especially as HDD reliability degrades with continuous use.

ESG compared the five-year TCO of the OceanStor Dorado 5000 V6, populated with NVMe SSDs, against that of a hybrid storage system with a mixed SSD/HDD environment of SAS SSDs and HDDs. The results showed that the five-year TCO of the OceanStor Dorado V6 is 78% lower than that of the hybrid array, with savings spread evenly across capital outlay for hardware/software, maintenance/support contracts, and power and cooling. ESG also expects that the IT operational costs will decrease as well due to the increased reliability of SSDs and the automation and availability features that Huawei has built into its all-flash array, allowing IT professionals to spend more time and resources on strategic activities, rather than maintaining storage.


The Bigger Truth

ESG research reveals that organizations have realized numerous benefits as a result of deploying flash storage, including improved application performance (48%) and improved total cost of ownership (47%). In the same survey, respondents identified both a desire to overall “future proof” the environment (56%) and to improve the performance of existing applications (55%) as among the objectives motivating their interest in NVMe, making them the two most-cited responses, based on the promise of reduced latencies and higher overall performance offered by the technology.

Data growth is accelerating, and the resulting infrastructure required to store and protect that data is costly and complex. Organizations are tasked with providing a high-quality, predictable, and productive computing environment for an ever-growing number of internal users and external customers while enterprise application environments have become increasingly unpredictable as their underlying IT infrastructure grows in complexity and size. Mission-critical business application performance is sensitive to storage performance and latency and highly dependent on the resilience of the IT environment.

The OceanStor Dorado V6 all-flash storage system is designed to handle mission-critical applications and workloads, both internal and customer-facing. The OceanStor Dorado V6 leverages an active/active multi-controller, NVMe architecture, and enterprise-class availability features implemented in software to provide a platform engineered for consolidating mission- and business-critical workloads at extremely low latencies.

ESG testing validated Dorado V6’s ability to consolidate the most challenging business- and mission-critical workloads onto a single, high-performance, highly available platform. The environment ESG tested serviced an Oracle RAC 18c database environment with 24 TB of live data onto a single Dorado 18000 V6 system. The OceanStor Dorado V6 serviced more than 220,000 IOPS at just 81 µs average response time and scaled to more than 846,000 IOPS with 477 µs average response time, all while providing continuous access through multiple planned and unplanned outage tests. It’s important to note that the performance described in this report was accomplished with adaptive data reduction and snapshots enabled and in use.

The results that are presented in this document are based on testing in a controlled environment. Due to the many variables in each production data center, it is important to perform planning and testing in your own environment to validate the viability and efficacy of any solution.

ESG is pleased to validate that the Huawei OceanStor Dorado 18000 V6 delivers consistently high performance at extremely low response times and is clearly well suited to support demanding real-world business applications running in a performance-critical highly virtualized environment. Next-generation all-flash storage systems are designed with a goal of making the best possible use of flash technology while avoiding many of the limiting factors of traditional storage systems. Huawei designed its all-flash array around the dual goal of solving business problems as well as storage problems.

It is no surprise that ESG’s five-year analysis demonstrated that by deploying an OceanStor Dorado V6 rather than an alternative hybrid storage system, organizations can lower their storage TCO by 78% while improving availability and reducing operational effort. If your organization is looking to lower storage TCO while increasing performance, ESG recommends investing in a next-generation all-flash array, and Huawei is worth a closer look.



1. Source: ESG Master Survey Results, 2019 Data Storage Trends, November 2019. All ESG research references and charts in this technical review were taken from this master survey results set, unless otherwise noted.
2. Contact your local Huawei supplier for more details.
3. 32 controller configurations will be available in the OceanStor Dorado 18000 V6 in 2020.
4. https://www.eia.gov/
This ESG Technical Review was commissioned by Huawei and is distributed under license from ESG.
Topics: Storage