ESG Validation

ESG Lab Review: Rack-scale Flash Architecture with E8 Storage

Abstract

This ESG Lab Review documents the results of hands-on testing of the E8 Storage E8-D24 all-flash storage array solution with a focus on resiliency and performance.

Market Trends in Solid-state Storage

The latest wave of solid-state use started with enterprise storage vendors shipping solid-state drives designed to fit in external disk storage subsystems. Therefore, it is not surprising to find that more than two-thirds of users reported leveraging this option in 2015 and that one-third of these organizations considered it their primary implementation type. The other top implementation type reported, extended cache/memory/primary storage in a server, clearly demonstrates that users were aware that solid-state implementation choices are no longer limited to being server- or storage-system resident.1

What benefits have users derived from solid-state storage? Since it has had such a significant impact on early solid-state storage adoption, it follows that a majority of users focused on improved application performance. The other most commonly identified benefits included improved resource utilization and economic windfall in the form of reduced OpEx and/or better TCO (see Figure 1).2

The Solution: E8-D24

The E8 Storage E8-D24 storage system is an all-flash shared block storage solution that uses NVMe SSDs internally and RDMA over Converged Ethernet (RoCE) externally to deliver high performance and low latency along with enterprise-class high availability. The solution includes the E8-D24 highly available storage array and the E8 client software, which are connected through a customer-supplied RoCE network infrastructure. The E8-D24 presents servers/hosts with the performance characteristics of internal NVMe drives along with the advantages of shared storage like high availability, centralized management, and lower effective storage cost.

The E8-D24 storage array contains the NVMe SSDs, two storage controllers, and 100GbE network ports. The NVMe protocol eliminates the processing overhead and latency associated with conventional drive interface protocols like Fibre Channel, SAS, and SATA, so NVMe-connected SSDs provide higher IOPS and bandwidth, and lower latency than conventional SSDs. The storage controllers are commodity Intel x86 servers, with internal redundancy for no single point of failure. The RoCE host-storage connection over 10GbE to 100GbE provides higher bandwidth and the DMA characteristic of the protocol takes the storage controller out of the I/O path to provide lower latency than conventional host-storage connections, along with Ethernet’s routability and switchability flexibility. The solution includes the E8 client, which offloads data path processing from the storage array and presents capacity to the host as a block storage volume.

The E8-D24 NVMe storage array is built from standard readily available hardware. The 2U enclosure is supplied by an Intel white box ODM. The enclosure has slots for up to 24 2.5 inch NVMe SSDs. Each of the SSDs is connected to two controllers via a passive PCIe mid-plane. Leveraging dual ported NVMe SSDs, dual storage controllers, and cross connections means that both controllers can access all the SSDs. The open NVMe ecosystem enables the use of the latest NVMe SSDs from partners such as Intel, HGST, and Samsung. The system used for the lab review performance testing had drives from the HGST SN200 series installed. With 7.68TB HGST drives, accounting for overprovisioning and RAID 6, the solution supports up to 154TB raw and 141TB useable capacity. Host side networking is implemented with two dual ported Mellanox ConnectX-4 100GbE network adapters per storage controller, for a total of eight 100GbE host connections per E8-D24 array.

As shown in Figure 4, the RDMA fabric lets the array have the flexibility of a single converged Ethernet network along with the low latency and high throughput required to use all the performance of the NVMe drives. Since a single NVMe SSD can support 20 Gbps of bandwidth, 10GbE ports on servers are becoming inadequate. The new 25GbE line rate lets a server keep its expected SSD performance, with a little bandwidth left for background processes. In an example installation, a 32 x 100GbE top of rack switch can support 96 servers—the E8-D24’s maximum host count—at up to 25 Gbps. The 32 100GbE switch ports are allocated as: 8 ports of 100 GbE for the array; 4 ports of 100 GbE for network uplinks; and 24 ports of 100 GbE cable-split into 4 25GbE ports each for a total of (24X4) 96 hosts. Hosts must have an RDMA-enabled NIC, running 10GbE or higher, 25GbE recommended, per above.

The E8-D24 storage solution software consists of a host server-resident client and the E8-D24 array software. The E8 client runs on many popular Linux distributions, including RHEL, CentOS, Ubuntu, SKES, and Debian. The client runs as a user-space service and typically takes up one CPU core. The client handles much of the data path processing to minimize the load on the storage controller CPU. The client presents capacity as raw block devices to the connected host. The array software manages data protection, including RAID6 and capacity management, where the capacity that’s presented to the host is represented on the array as volumes. The software, along with the RoCE protocol, add only about 10us of additional latency in a host request on an NVMe drive.

Software aspects of the RDMA fabric help the E8-D24 deliver performance. The solution uses RoCEv2, which achieves low latency by NIC-level congestion control and retransmissions. The E8-D24 supports both network QoS in both lossless network configurations (LLFC, PFC) and lossy network configurations. In large-scale deployments, E8 Storage recommends configuring RoCEv2 to not mandate using a lossless network by using Early Congestion Notification (ECN). Network high availability is as simple as using two Ethernet switches instead of one. And the E8 client handles multi-pathing for high availability.

ESG Lab Tested

ESG Lab conducted remote hands-on validation of the E8 Storage E8-D24 solution by leveraging connectivity to a test environment located in an E8 Storage R&D facility in Tel Aviv, Israel. ESG Lab tested ease of management and resiliency, and exercised the product to determine its performance and scale characteristics.

Easy to Manage Resiliency

ESG Lab started testing with an exploration of the E8-D24 management interface by testing fault recovery scenarios and observing how the solution handles different component failures.

The E8-D24 provides a web management interface for configuring, monitoring, and maintaining the storage appliance. Since the solution software handles the array back-end functions out of sight of the user, the management GUI consists of six tabs. The Dashboard tab provides a summary overview of array status, showing performance, a list of recent events, an Alerts indicator, Hosts and Volumes, and capacity usage. The System tab provides a graphic representation of the array hardware components and their status (see Figure 5). The Hosts tab shows connected hosts and status—mapped or unmapped—and can display performance metrics on a per-host basis. The Volumes tab shows volume status—mapped or unmapped—and can display performance on a per-volume basis. The Volumes tab gives the user controls to add unmapped volumes. The System Log tab shows log entries. The Settings tab configures network, system, user, and notification settings.

The solution is designed for enterprise-class high availability, with no single point of failure and multiple and/or redundant data paths. SSDs are hot-pluggable and dual ported, and there are two redundant PCIe busses in the chassis drive backplane. There are two storage controllers and both controllers are connected to both backplane busses. Each controller can have up to four 100GbE ports for host I/O. The E8 client software on the host server handles multi-pathing, including path failover, and can accommodate dual Ethernet switches for network redundancy. Data held on the SSDs is protected with dual-parity RAID6. There are two redundant power supply/fan units, each with its own backup battery.

ESG Lab tested failure and recovery of several E8-D24 system components while the array was running a 2 million IOPS mixed read/write workload. As shown in Figure 6, we tested SSD failure and replacement, storage controller failure and replacement, and a network port failure with load balancing. We used the management interface’s Dashboard tab to observe the impact of the failure and the array’s ability to recover.

E8-D24 Failure Injection Scenarios:

  • Failure scenarios included a drive, controller, and network port failure.
  • A steady state 2 million IOPS workload was used for all three failure scenarios.
  • The controller failure details are reported in a subsequent paragraph of the report.
  • First, an SSD was removed from service, I/O paused, and returned to degraded mode in less than 30 seconds.
  • Then the SSD was reinstalled, I/O paused, and returned to degraded mode in less than 30 seconds.
  • The I/O remained in degraded mode during RAID6 rebuild, then returned to full steady state in less than 30 seconds.
  • For network failure testing, a port was disabled and removed from service.
  • I/O dropped on failure, then automatically returned to steady state in less than 15 seconds using multi-pathing.

Finally, as shown in Figure 7, the management interface dashboard view was used to display the performance impact of a storage controller failure. To demonstrate this, we first powered down the storage controller. The UI’s System tab (not shown in the figure) reported an alert for the storage controller failure at 11:26:44 and highlighted the failed component in red, indicating that its network interfaces were no longer available.

As shown in Figure 7, the Dashboard tab showed the array’s failure handling—total I/O dropped to zero upon failure, but returned to the full 2M IOPS level at 11:27:11.

Why This Matters

As storage technology continues to evolve, from HDD to SSD and now NVMe, it still needs to meet the foundational requirements of infrastructure—it has to be dependable and easy to use because the business impact of loss of data or downtime translates directly to lost revenue or the ability to service users. And infrastructure that’s easy to manage requires fewer people, helping keep operational expenses in control.

The E8-D24 storage array solution from E8 Storage has a straightforward web browser-based UI and an enterprise-class high availability design. It took ESG Lab just a few mouse clicks to navigate the management UI and the E8 software keeps the interface simple. In the most impactful test, involving the failure of a storage controller, the array returned to full operation in 27 seconds, well within the I/O retry timeout of most servers.


Performance

Performance is “the execution of an action.”3 High performance is when the action is executed faster, better, or more efficiently than others.4 For this review, ESG Lab used performance as a key indicator of how well the E8 storage array was able to deliver the storage resources needed to perform the business tasks at hand.

ESG Lab’s performance validation began with an examination of the I/O and throughput capabilities of the E8 Storage solution and included IOPS, latency, and throughput analysis. For testing, the FIO workload generator tool was used to simulate multiple workloads.

As shown in Figure 8, a 100% random read workload was used to demonstrate the maximum I/O read capabilities of the solution. Here, the test environment was configured with 16 Linux hosts and the number of FIO jobs per host was scaled until the maximum I/O was observed (10,319.07).

What the numbers mean:

  • At low queue depths, 120 microsecond end-to-end latency was achieved.
  • The E8-D24 delivered 4.7 million IOPs at a latency of approximately 220 microseconds, and 9.5 million IOPS at a latency of approximately 540 microseconds.
  • Up until 7M IOPs, the E8 D-24 provides latency of less than 300 microseconds.
  • During the 100% random testing, we scaled the environment from 1 to 32 workload jobs per host.
  • With a block size of 4k and a 100% random read workload, the E8-D24 array delivered over 10 million IOPS.
  • At 10,319,078 IOPS, the latency of the solution was less than 3.2 milliseconds.

Next, as shown in Figure 9, a 70% random read with 30% random write workload was used to simulate an online transaction processing (OLTP) database workload. Here the test environment was scaled to 20 Linux hosts with up to 16 jobs per host.

What the numbers mean:

  • During the 70% random read/write testing, we scaled the environment from 1.2 to 3.2 million IOPS.
  • Read/write latency observations during testing were as low as 356 microsconds at 1.2M IOPS.
  • Average write latency observed during testing ranged from as little as 93 to only 953 microseconds.
  • With a block size of 8k and a random 70/30 read/write workload, the E8-D24 array delivered over 3.2 million IOPS.
  • At 3,267,776 IOPS, the latency of the solution was only 5 milliseconds.
  • All testing was done with RAID6 striped on 24 SSDs.

Finally, as shown in Figure 10, ESG Lab analyzed the throughput capabilities of the E8 storage solution. Here the test environment was scaled from 1 to 20 Linux hosts with 16 FIO jobs per host.

What the numbers mean:

  • The solution scaled from approximately 4.5 GB per second to approximately 43 GB per second.
  • The solution was able to deliver a maximum of 43,396,548 KB per second with 20 host connections.

Why This Matters

For a business to be successful it must perform well at all levels. This includes high performance for the infrastructure that supports business operations. An antiquated storage infrastructure that requires a huge investment in storage resources for a traditional upgrade should not force an organization to choose between productivity and status quo.

ESG Lab confirmed that the E8 storage solution was easily able to deliver high performance I/O to business applications that demand it. The E8-D24 storage array in its efficient 2U formfactor was able to deliver over 10 million IOPS for a 100% random read workload and over 43 GB/sec of throughput for large block read operations. With latencies of less than a half a millisecond, the solution was still able to deliver 1.6 million IOPS for an OLTP workload.


The Bigger Truth

Since this latest wave of solid-state use started with enterprise storage vendors shipping solid-state drives designed to fit in external disk storage subsystems, it is not surprising to find that more than two-thirds of users surveyed by ESG were leveraging this option in 2015. However, ESG research also indicates that users were aware that solid-state implementation choices are no longer limited to being storage-system-resident only.

Customers once had to face a tough choice. They could use SSDs as local storage in their servers to get high performance or they could put those SSDs in all-flash arrays in a SAN to gain enterprise manageability and provisioning capabilities.

Now, E8 Storage’s availability architecture, combined with its design for high performance and low latency, overcomes a traditional drawback of dual controller storage architectures. The E8-D24 does not have cache consistency issues between storage controllers like a legacy array, which greatly simplifies the internal array software. Instead, the E8-D24 relies on its low latency to get data to the non-volatile SSDs, and acknowledge the I/O to the host, without storing data in the controller itself. In summary, the E8-D24 delivers high performance and low latency because the E8 client software running on a host server communicates using RoCE almost directly to the NVMe drives in the array. The array CPU is not involved in I/O.

Initial E8 customers in financial services and retail, running both transactional and analytics applications, leverage E8-D24’s ease of use to keep up with business growth and high availability as a foundation for critical applications. ESG believes that if E8 storage can field-prove its reliability over time, and continue to add new support for additional OSs, it has the opportunity to help redefine how flash infrastructures are deployed.



1. Source: ESG Research Report, 2015 Data Storage Market Trends, October 2015.
2. ibid.
3. Source: Merriam-Webster Dictionary, Definition of Performance.
4. Source: Merriam-Webster Dictionary, Definition of High Performance.
Topics: Storage Networking