ESG Technical Review: Reducing Operational Complexity with Emulex SAN Manager

Co-Author(s): Alex Arcilla


This ESG Technical Review documents hands-on testing of Emulex SAN Manager (ESM) by Broadcom to verify its ability to reduce operational cost and complexity in the modern data center with end-to-end in-band monitoring and management. We focused on how ESM provides visibility into the operational status of all Fibre Channel (FC) host bus adapters (HBAs) across the SAN and accelerates management, upgrades, and troubleshooting.

The Challenges

ESG research has found nearly two-thirds (64%) of organizations surveyed view their IT environments as more complex than they were two years ago (see Figure 1). Although numerous factors contribute to such complexity, ESG research uncovered that higher data volumes was the most-cited reason for increased complexity.1 While organizations may not welcome the resulting complexity in their environments, they do realize that storing and managing the data is important to the business as 71% consider data storage technology to be strategic to their operations.2

End-users in organizations need optimal application performance to satisfy business requirements expediently. If an organization cannot get insights from business analytics quickly and efficiently, opportunities to generate revenue or decrease costs are put at risk. As organizations generate and collect growing amounts of data, removing congestion points that prevent timely access to data becomes paramount. However, the current methods for managing the data center environment rely typically on separate server and storage interfaces to manage disparate FC HBAs, storage, and networking resources. Separate management interfaces lead to operational complexity, and subsequently, increased operational costs. To minimize such complexity, organizations need a solution that integrates the management of compute, storage, and network resources, while simplifying management and troubleshooting to decrease operational complexity.

The Solution: Emulex SAN Manager

Emulex SAN Manager has been developed to help organizations simplify management and troubleshooting of FC HBAs via the SAN to reduce operational cost and complexity. Using an in-band protocol, organizations can connect to servers with a lightweight application that connects over Fibre Channel with no agents and no need for separate Ethernet connections to the servers.

ESM functionality falls under three categories:

  • Monitor—Organizations can gain visibility into the current inventory and operational status of HBA resources. This enables organizations to identify hardware issues, identify needed firmware updates, and run queries and generate reports of HBA status.
  • Manage—ESM enables organizations to manage the HBAs directly through the SAN. Organizations can make changes directly on the HBAs in-band without having to coordinate tasks between server, storage, and network administrators. ESM can enable congestion management to correct performance-related issues, queue depths on individual HBAs can be reviewed, and HBA optical transceiver statistics can be downloaded and analyzed to detect optic degradation quickly in order to minimize downtime.
  • Adapt—Emulex and Brocade have collaborated to help identify and remediate SAN issues using Fabric Notifications. The notifications signal IT that an issue has been detected, enabling IT to review the issue, and provide policy-based remediation options.

Key to ESM technology are Emulex managed HBAs. Managed HBAs are intelligent adapters that are designed to reduce the complexity of managing enterprise-class SANs. Unlike other adapters, managed HBAs are designed to perform many operational tasks without the intervention of the host on which they reside. Managed HBAs differ from other adapters because they:

  • Register with the fabric as managed devices.
  • Communicate across the fabric, in-band, to the Emulex SAN Manager centralized management tool.
  • Collaborate with the fabric to identify and address performance problems with fabric notifications.
  • Monitor and record performance data and fabric notifications for analysis.

Use Cases

Remediate application performance problems due to network congestion—When new data center infrastructure is deployed, administrators have well-established methods for right-sizing the server, HBA, Fibre Channel network, and storage to deliver the performance expected. Over time, some workloads outgrow the server that they are running on and the server will run out of CPU cycles, memory, PCIe bandwidth, and/or HBA bandwidth. This can be due to several issues, from too many VMs on a virtualized server to an application that outgrows the hardware that it is running on. This causes the server to request more data from the storage system than the server can ingest, which causes slow-drain or congestion problems in the fabric. This has an impact on both the overutilized server, which often causes I/O latency to increase by 10x or more, and other servers in the fabric, which can see their performance cut dramatically. Since Fibre Channel is a lossless network, the overwhelmed server creates a situation where the hardware resources of the fabric are consumed, creating a performance issue across the entire fabric.

Reduce complexity and the risk of an unstable operating environment due to lack of adequate change management solutions—Organizations’ operating environments can become unstable due to unsupported firmware and driver pairings, configuration drift out of policy, inability to validate system updates and upgrades, and misconfigured multipathing that may result in downtime.

Reduce time spent managing HBA optical transceivers—SLAs can be compromised due to optical transceiver failures and/or a lack of transceiver management solutions. Failed optics can cause application performance issues or crashes. Optics begin to degrade on first use and will eventually wear out; data center admins in large networks change numerous optical transceivers every month—a very time-intensive activity, made more difficult when customers either don’t have access to the transceiver data or they have an abundance of transceiver data and port statistics but have no good way of viewing it and analyzing it.

ESG’s review of Emulex SAN Manager is focused on how ESM’s tools help IT to resolve complex issues quickly, reducing operational effort and cost. Features of interest in this context include Adaptive Congestion Management, in-band HBA management, and a SAN-wide view of all hosts.

ESG Tested

ESG examined a test SAN using ESM. ESM provides three main functions: Congestion management, inventory, and transceiver data. We reviewed the inventory feature, confirming configurations, firmware, and driver versions from the landing page. We were able to identify multipath exceptions with one click. Multipath exceptions are a physical layer check for hosts connected through a single port or hosts that are dual attached through a single switch. Next, we confirmed that the transceiver data feature was available for all managed HBAs in the environment. We then performed an in-depth walkthrough, starting with a congestion detection and management exercise. It’s important to note that everything provided by the ESM GUI is also available through a scripted interface to collect data over time for deeper analysis using tools like Splunk to detect degradation of optical modules and transceivers.

First, ESG used the ESM GUI to look at congestion management in our test bed. It is worthwhile to note that Fibre Channel SAN congestion is uncommon, and most environments do not experience it. For those that do, we believe this to be an especially useful tool. Shown in Figure 3, the top of the dashboard provides an at-a-glance summary of the health of all HBAs, congestion detected in HBAs and the fabric, and congestion settings in the environment. ESM pulls useful data from hosts, including queue depth settings—a customer-requested addition—since incorrect queue depth settings are often implicated in performance issues.

ESG clicked the host entry that was showing congestion, clicked the menu button on the right side of the screen, and looked at bandwidth/response time history and congestion history, as shown in Figure 4. This server was attached via 16G Fibre Channel, and it was consuming almost all its available bandwidth—1,500 MB/sec. Looking at the congestion history shows that this host is causing congestion both at the HBA and on the SAN. This is a common issue, where an ill-behaved application consumes more resources than the server can provide and consumes every bit of SAN bandwidth available to it. ESG selected Update Congestion Info and changed the setting for this port from Monitor Only to Moderate.

Almost immediately, bandwidth consumption dropped by nearly a third and response times dropped from more than 4,000 µsec to just over 500 µsec, a reduction of more than 8x. At this point, congestion at the HBA dropped to none and SAN congestion cleared as well.

The adaptive nature of the algorithm means that it will continue to monitor the host and adjust as needed to remediate congestion. In practice, organizations could choose to enable congestion management for all tier-two and tier-three workloads at first, to ensure that those lower priority workloads can’t negatively impact SAN performance, then enable it for all hosts as they gain confidence, using it as an indicator that it’s time to resize a server for a growing workload.

Next, ESG walked through the ESM Inventory Feature. Emulex SAN Manager provides visibility into all the HBAs connected to the SAN so administrators can:

  • Verify that firmware versions currently align with corporate standards.
  • Verify that driver versions align with current corporate standards.
  • Identify driver-firmware mismatches.

Emulex SAN Manager provides a complete inventory of the SAN including managed HBAs (intelligent HBAs designed to reduce the complexity of managing enterprise-class storage networks) and unmanaged HBAs (legacy HBAs including third-party HBAs).

Emulex SAN Manager retrieves the following parameters from the SAN: WWPN, HBA, WWNN, PID, model, model description, vendor ID, serial number, firmware version, driver version, host name, OS name and version, fabric name, and link speed. Emulex SAN Manager provides additional features that make it an ideal tool for managing large environments, including multilevel filtering options that allow administrators to quickly sift through the data and identify critical endpoints, and the multipath validation tool that allows SAN administrators to easily identify potential misconfiguration errors before taking a switch offline for maintenance or upgrading.

The window in Figure 6 shows the status of Emulex managed HBAs as well as unmanaged HBAs. It’s important to note that only a portion of the window is shown; this image has been cropped due to space limitations. There are many more columns of data available to view in the inventory.

Finally, ESG looked at the transceiver data feature. Fibre Channel HBAs are recognized for their extreme reliability; however, one of the most common causes for HBA downtime is optics failure. When managed HBAs are detected in the environment, Emulex SAN Manager communicates with them in-band across the SAN to retrieve a complete, up-to-date set of real-time HBA transceiver data to help identify trends that can signal potential optics failures. This enables administrators to track and identify optics problems and mitigate them before the optics fail, ensuring maximum uptime and performance.

Emulex SAN Manager retrieves the following transceiver data: WWPN, host name, part number, vendor, revision, OUI, ID, Ex ID, connector, wavelength, supported speeds, manufacturer date, temperature, current, Rx power, Tx power, and voltage. Optics information can also be refreshed at the click of a button to get real-time data for diagnosing SFP issues.

Why This Matters

ESG asked organizations to name their biggest challenges in terms of their on-premises storage environment. When talking about SAN-related challenges, rapid data growth (24%), running out of physical space (21%), and staff costs (20%) were all cited as challenges. Device management and poor performance were also cited. Organizations looking to simplify administration and enhance availability and performance guarantees for on-premises workloads need a solution that can provide end-to-end visibility, management, and troubleshooting.

ESG validated that ESM simplifies SAN management and accelerates problem resolution through an agentless, in-band management platform that gathers, aggregates, and correlates relevant HBA metrics; tracks HBA optical transceiver health; identifies configuration, firmware, driver, and multipath issues; and provides policy-based congestion management for managed HBAs. ESM’s ability to identify and remediate these issues enabled us to measure and troubleshoot performance with immediate results.

The Bigger Truth

The amount of data that organizations are currently generating and collecting is driving higher capacities, which require more infrastructure, resulting in higher complexity in the IT environment. It does not help that organizations are experiencing skills shortages across multiple disciplines—including storage—and are moving toward hiring more IT generalists rather than storage specialists. ESG research reveals that 62% of organizations view the majority of new IT positions being filled by these generalists. To better address the increasing complexity within their IT environments while helping IT to simplify their management and administration tasks, organizations need a management solution that reduces operational complexity, enhances availability, and guarantees performance.

ESG validated that Emulex SAN Manager can help organizations to simplify availability and performance management within the data center, specifically by enabling the management of FC HBAs via the SAN. ESG testing showed that ESM’s in-band, agent-less platform can identify congestion at the FC HBAs and on the SAN, then mitigate the impact of ill-behaved applications or systems on performance. With ESM, organizations gain comprehensive visibility into multiple areas that can impact availability and performance, including HBA inventory, hardware and transceivers, firmware and drivers, multipath connectivity, and queue depth.

Complex storage, server, VM, and SAN deployments are becoming the norm to support modern data centers and these challenges are here to stay. In this context, SAN availability and performance are critical components of IT operations and making configuration changes must take a backseat to troubleshooting. Modern SAN management systems must focus on accelerating and simplifying arduous day-to-day management and problem solving. Organizations need to reduce the time, effort, and specialized personnel devoted to problem resolution, and enable IT to focus on provisioning and optimizing the environment to meet current and future business needs. ESM’s in-band HBA management, SAN-wide view of all hosts, and Adaptive Congestion Management work together to address all these issues. Organizations looking to increase availability, improve IT efficiency, and reduce the time required to resolve complex issues should take a serious look at Emulex SAN Manager.

