ESG Validation

ESG Lab Review: Application-aware Management, Visibility, and Analytics in Virtualized Environments with Uila

Abstract

This ESG Lab Report validates Uila’s full stack management, visibility, and analytic capabilities. Testing focused on fast root cause analysis through application visibility, complete infrastructure monitoring with application context, and end-user experience monitoring for improved application interactions.

The Challenges

The IT modernization movement is well underway. From on-premises virtualization to multi-cloud infrastructures, organizations continue to look for ways to consolidate their infrastructure and improve operational efficiency. Whether it is the result of an organization-wide mandate from upper management or simple cost reduction initiatives within smaller business units, the goal is clear: Make the infrastructure lean and cost-effective. Unfortunately, infrastructure consolidation and operational efficiency do not go hand in hand. IT administrators already lack an ideal tool to manage and monitor a massive infrastructure made up of multi-vendor components. How can they be expected to not only consolidate all of it, but also effectively manage it? The simple answer is they can’t—not using traditional management and troubleshooting tools that provide pointed visibility without context.

There are tools that monitor and report information at a server level from the hypervisor and VMs, but what about the applications? There are tools that monitor the applications, but what about the underlying infrastructure? There are core infrastructure management tools that require agents, but what about the VMs and what is running on them? This proliferation of specialized tools has paved the way for modern application performance and network monitoring tools that can monitor all traffic throughout the infrastructure to provide a better picture of what is truly going on, but even these approaches are missing the ability to quickly and easily tie all of that collected data together to swiftly identify and troubleshoot problems. Organizations are looking for a solution that manages and monitors everything from the end-user’s interaction with an application, through the VM where that application resides, the virtualization layer, and the virtual and physical network, to the underlying hardware that eventually services the request, including storage and compute. This type of full stack visibility is essential to completing the transformation to a modern data center.

The Solution: Uila

Uila is an application-aware infrastructure management and monitoring software solution that provides visibility into all aspects of complex IT environments. This full stack visibility enables organizations to map out all dependencies of an application, from the end-user interaction with the application itself to its underlying resources, creating a granular topology of the entire infrastructure. Key insights are then achieved by making use of the overall efficiency of the infrastructure. This not only helps with faster troubleshooting to reduce infrastructure outages, but in fact, also enables organizations to prevent these outages altogether. This automatic level of proactivity creates an environment where application uptime is nearly guaranteed. By aligning application and infrastructure performance monitoring, organizations gain access to direct correlations between performance expectations, real-time measurements, issue detection, and single-click remediation to keep the complete stack running at peak performance.

From an application standpoint, Uila auto-discovers an application and tracks real-time performance across its entire infrastructure topology, including any interdependencies among application servers and across resource tiers. This not only includes core component metrics, such as network response time, storage latency, IOPS, CPU, and memory utilization, but also application-specific metrics, such as transactions or query response times. Uila covers both physical and virtual application servers, with details being monitored on a per-process level, allowing root cause analysis to go deeper, and providing faster and better insights into what is happening.

With complex virtual infrastructures made up of a mix of vendors, Uila can provide detailed monitoring statistics across popular virtualization layers, such as VMware and Hyper-V, to provide a complete view of the virtualized infrastructure in correlation with multi-vendor physical resources. Uila’s intuitive interface allows organizations to easily spot underlying component issues at a compute, storage, and networking level, such as overused or bottlenecked resources especially when application service performance is impacted. Specifically for networking, Uila bridges the management gap between physical and virtual networking components to provide a true view of network traffic from one VM to another.

Together, Uila’s application and infrastructure monitoring eliminates the traditional pain points of resource silos and incomplete vendor-specific management platforms. Color-coded application-to-infrastructure dependency maps tie every component together and simplify not only the detection of issues for faster root cause identification, but also, more importantly, the prevention of issues altogether.

Uila Architecture and Key Components

The Uila architecture takes a hybrid approach that combines on-premises with the cloud to deliver a fast, scalable solution to meet the real-time needs of the business. Key to the platform is the 100% agent-less architecture combining deep packet inspection technology with virtual infrastructure management. Three software components make up the architecture (see Figure 2).

Uila Management and Analytics System (UMAS)

UMAS serves as the brain of the platform and can be deployed on-premises or in the cloud. This analytics engine receives all metadata from the core infrastructure and correlates full stack performance metrics from applications to the underlying infrastructure. As part of UMAS, the Uila Dashboard provides a rich and intuitive interface to view analysis completed by UMAS, enabling complete infrastructure health reporting and visibility.

Uila Virtual Smart Tap (vST)

The vST is deployed in right-sized guest VMs across the data center and monitors all traffic on the virtual networks. Deep packet inspection enables the software to identify not only 4,000+ applications, but also the attributes of individual requests within each packet. As part of the vST, application response time, latency, and other network metric metadata is collected. The application and network metadata is then passed to the Virtual Information Controller, where it is combined with other collected infrastructure metric metadata.

Uila Virtual Information Controller (vIC)

The vIC serves as the intermediary between the UMAS and the infrastructure. A Uila monitoring domain is built on the vIC using a configuration template, which then allows for quick and easy deployment of as many vSTs as required. vIC collects all performance metrics from the physical infrastructure through a virtualization management system, such as VMware vCenter. This data is combined with the application and networking metadata collected from the vSTs and then gets transmitted to the UMAS for processing, analysis, and visualization through the user interface.

Full-stack Visibility and Application Context for Faster Root Cause Analysis

The first phase of testing focused on the full stack visibility of Uila that provides application context for faster root cause analysis.

Root Cause Starting at the Application

ESG started with an introduction to the Uila dashboard. The single-pane-of-glass user interface provides all information monitored by Uila software with an easily understandable color coding from applications to underlying resources. If everything appears green, the infrastructure is considered healthy. If different colors in the yellow, orange, or red spectrum appear, a potential problem or full-blown issue are present. The colors are different based on how far they are away from the measured and auto-learned baseline, so if a measurement is 20% above the established baseline, red is displayed. Dynamically-learned and auto-adjusted sliding baselines can be overridden by a user-configurable threshold. Though thresholds are not set by default out of the box, they are configurable based on personal preference. Five color wheels are displayed that highlight the health of monitored resources, including application, network, CPU, storage, and memory. The wheels are layered based on the resource as applied to the overall resource. For example, the application color wheel (see Figure 3) highlights application performance at a core data center level, a cluster level, a host level, and finally, a VM level. Specifically, at the VM level, multiple VMs are displayed around the ring of the circle at varying sizes based on the total number of transactions per VM.

A very important aspect of the interface is the histogram timeline appearing at the top of the dashboard, where a time-slider can be used and adjusted to focus on a specific point in time as desired. Five rows are displayed that highlight the health of the monitored application performance, CPU, memory, storage, and network. ESG moved the slider to a time range between 3 AM and 6 AM that showed very poor application performance and the entire dashboard instantly updated to reflect what was happening throughout the infrastructure during that period. A view of the updated dashboard is shown in Figure 4. Note the application and storage wheels, which show a significant amount of red. Just by mousing over a single entity displayed within any of the health wheels, information related to the entity is displayed, including granular statistics like services running on a specific VM and their response times, transactions, throughput, and packets.

ESG navigated to the application analysis monitoring screen, which displayed an updated timeline specific to application transactions and response time. The main section of the screen displayed impressive details about the infrastructure supporting the applications, including topology maps and dependent services, which are automatically created by Uila upon deployment of the software. The dependent services map viewed by ESG (see Figure 5) displayed a single load balancer branching to two WebLogic servers, then database servers, and eventually three external devices, two of which were gateways. One of the gateways provided DNS service, which explains why nearly all the underlying database servers were connected to it.

Based on the colors in the map, ESG easily pinpointed an application bottleneck, and by mousing over the red dot, a view of the services was shown (see Figure 6). An issue with the MySQL service was present.

Having identified the underlying service that was causing the problem, ESG simply clicked on the MySQL service and root cause analysis was completed in seconds by Uila. Uila correlated the application degradation to the underlying infrastructure and identified that the root cause of the degraded service was due to the storage (see Figure 7).

ESG dug deeper by clicking on the storage health panel, which presented latency and IOPS metrics at a virtualization layer, hypervisor kernel, vDisk, and storage data store level. As shown in Figure 8, read latency was a concern at all three levels. There also appeared to be a strong correlation with poor read latency and higher read IOPS. By simply mousing over the read IOPS and clicking into a detailed view, a VM was identified as the culprit for consuming too many resources, representing a common issue that can bring a virtual environment to its knees, the “noisy neighbor.”

Root Cause Starting at the Physical Infrastructure

Having completed root cause analysis of a degraded application, ESG shifted focus to the underlying infrastructure resources. With comprehensive visibility into both applications and physical infrastructure, ESG walked through the same example, but this time started at the physical infrastructure level with a goal of highlighting Uila’s ability to deliver core infrastructure monitoring with context around the applications running on said infrastructure. After navigating to the storage analysis screen, ESG used the timeline to easily pinpoint a time period during which an issue had arisen. Once selected, a storage usage view (see Figure 9) highlighted a map from the physical disk, to the data store, the host, the VM, and the vDisk. At all aspects of the map, ESG could mouse over and dig deeper to identify the services that could be causing the issue. Within the storage analysis view, ESG also learned about Uila’s infrastructure alarms, which always display a correlation between a specific infrastructure component and a service or application that uses that component. Alarms are triggered based on predefined latency thresholds.

For completeness, ESG also viewed the same infrastructure timelines at a memory and CPU analysis level. Unique visualizations highlighted the full stack based on each resource view. For memory, the circle packing view was used (see Figure 10).

For the CPU analysis, the tree view helped display a complete traversal of the full stack (see Figure 11). In both cases, each unique component could be moused over and clicked on for more detailed statistics to help quickly identify a physical or virtual resource that was the root of the problem or an application server whose service performance (response time) was impacted.

Simplifying the Management and Monitoring Complexities of Physical and Virtual Networking

The final resource ESG focused on was the network. When it comes to networking, many tools exist that help with the management of the physical network, but with the increased use of server virtualization, new complexities are introduced related to virtual networking management. Uila can tie together the management and monitoring of the physical and virtual networks within an infrastructure to visualize not just north-to-south traffic, but east-to-west (VM-to-VM), too.

ESG used the networking flow analysis chart (see Figure 12) to present a NetworkSwitch view of the environment, which depicted a complete traversal of the physical and virtual network. The first two columns, the Top-of-Rack (TOR) switch and Host, represent the physical network, while the next three columns, including the distributed virtual (dv) Switch, port group (VLAN), and VM, represent the virtual network. The right-most column, where everything ends, represents the classifier (application protocol) of the traffic type, such as a transaction to a MySQL application. By simply mousing over any of the columns, detailed network statistics are displayed and, like almost everything else in the user interface, are color-coded based on health. Reported statistics include network response time, fatal retries, virtual packet drops, zero windows, and resets.

Taking the visualization of the virtual network a step further, ESG explored the ability to view VM-to-VM and VM-to-PhysicalServer connections. In the network conversation tab, the Top N Sankey view highlights the VM pairs communicating. As shown in Figure 13, this view not only highlights the pairs, but the width of the bars also represents the bandwidth being consumed by said pair. In this example, one pair was consuming nearly 100% of the virtual network’s bandwidth. By mousing over the wide bar, ESG saw the services that were responsible for the network bandwidth consumption, as well as their application response times, transactions, and throughput. The MySQL service was responsible for the large chunk of bandwidth between the two VMs.

Why Full Stack Visibility Matters

As organizations continue the pursuit of becoming as operationally efficient as possible using virtualization, complexities around visibility and management are cause for concern. What happens if something goes wrong? Who or what is responsible for the outage? Could it have been prevented? These questions are not easy to answer, especially in virtualized environments. Tools exist to help with the infrastructure monitoring, but they are often siloed based on the type of resource, platform, or vendor. Further, some modern management tools help tie the virtual infrastructure to the physical infrastructure, but still lack the “smarts” to provide context related to the application. Between the time it takes to resolve a single issue, the minimal transparency and therefore visibility, the inability to correlate across the entire infrastructure stack, and the ongoing headaches of IT administrators who are responsible for fixing issues, organizations want a smarter, leaner platform that takes the guesswork out of managing an expansive, vendor-diverse virtualized environment.

ESG validated the comprehensive management and monitoring of Uila’s software platform. From the core physical resources to the applications living in VMs, ESG witnessed full stack visibility and reporting that enabled rapid insight into the performance of an expansive, dynamic virtualized infrastructure. Uila’s interactive user interface provided an impressively detailed view of the infrastructure. Root cause analysis was completed quickly, whether it was started from the application to the infrastructure, or vice versa. With the ability to simply point and click on nearly any component or metric, ESG easily diagnosed a storage issue impacting application latency.


Monitoring and Troubleshooting the End-user Experience

The second phase of ESG’s analysis focused on the newly released end-user experience monitoring and troubleshooting feature of Uila’s software. Together with its full stack visibility, Uila can ensure a positive end-user experience by enabling proactive service remediation.

ESG navigated to the user experience dashboard (see Figure 14). As with all the other modules, a timeline is displayed at the top of the interface, which contains historical data related to key metrics, including transaction time, network delay time, and application response time. Centered in the interface are individual line charts based on IP address ranges, and this example focused on two locations, Taiwan and San Francisco. The charts present overall end-user transaction response times, which are broken down into three sub-times. The data process time represents the client-side time required to complete the process of receiving all data from the application. Network delay time is the time introduced by the physical network between the remote side and the virtual data center. The application response time is the time consumed by the VM server. Together, whenever the end-user response time is slowed, Uila enables IT administrators to identify the culprit to specific issues at local or remote sites, including at a data center server or WAN networking level.

ESG zoomed in on one of the repeating transaction response time bumps. As shown in Figure 15, nearly two-thirds of the end-user transaction response time was due to a greater than normal application response time on the VM server side. The bottom corners in each graph highlight the problematic area as identified by Uila. In this example, the physical network appeared green, meaning it was not the issue, while the virtual server button appeared orange, enabling ESG to focus there. By clicking on the virtual server button, granular details about the virtual server were displayed, including the server name and its health scores, which correlated the high response time to a poor application health score.

After clicking the server’s name (wpserver), ESG was presented with even more details (see Figure 16), including dependent services and an infrastructure topology map. The topology map showed a virtual database server directly in the data flow that was also problematic. From within the tooltip of wpserver, ESG clicked on the http service provided by the server for root cause analysis.

The root cause viewer highlighted the likely cause to be a dependent service running on the database server (see Figure 17). The service was MySQL, which was relied on by the end-user to satisfy a request. By clicking on the problematic dependent service, ESG drilled down into the detailed resource view within the database server, which showed the primary cause of the issue was CPU health. From there, ESG identified that the VM’s CPU ready metric appeared high, which led to the discovery that the physical host was near-bottlenecked at a CPU level. It should be noted that Uila does not fix the issue, but provides the knowledge and documentation into how to fix it.

Why End-user Experience Matters

The end-user experience is arguably the most important metric of a successful application deployment. One complaint or poor review can quickly lead to a PR nightmare, having a dramatic effect not just on customer loyalty, but also on business revenue. Maintaining an optimal application environment with no service degradations or disruptions is essential, and ensuring IT administrators are armed with the best technology to maintain that high level of service should not be overlooked.

ESG validated Uila’s ability to identify the end-user experience and correlate that experience with relied upon components throughout the infrastructure. Root cause analysis was undertaken based on sub-optimal application response times experienced by a specific range of IP addresses. The issue was quickly tracked from the end-user requests, through the VM housing the application, and the database server that the application relied on, to a CPU bottleneck identified at the physical host level. From identification of a problem to root cause analysis, the entire process took just minutes.


The Bigger Truth

With infrastructure modernization initiatives fully underway in many organizations, the oft-overlooked concept of comprehensive management has never been more important. It is nearly a requirement to have complete visibility into what is using every resource in massive data centers, from a top-of-rack switch or recently purchased all-flash storage solution to the new virtual cluster or legacy database product. Virtualization has helped with cost and consolidation initiatives, but has introduced cause for concern related to management. There is a lack of tools to singlehandedly provide visibility into the physical and virtual environment resources, while also monitoring what application is running on the VMs themselves. This has led IT administrators down the time-consuming path of trying to merge data from disparate management tools in hopes of gaining the smallest amount of insight to better identify and troubleshoot issues or, better yet, prevent them.

Uila has introduced an application-aware management, monitoring, and analytics platform to provide full stack visibility into large, dynamic virtualized environments. From the end-user’s interaction with an application, through the operating system and virtualization layer, to the underlying physical resources, Uila enables organizations to be proactive instead of reactive when it comes to maintaining a modern IT environment with the latest and greatest technology. Organizations can ensure applications continue to run at peak performance levels while maintaining optimal response times. If a problem does arise, Uila provides root cause analysis in minutes, as opposed to hours or even days, giving IT administrators peace of mind knowing they can easily manage and maintain highly virtualized production environments that serve as the lifeline of their business.

Forget about vendor-specific or application-specific management tools. Forget about disparate tools to manage individual components. Forget about tools that manage some of the infrastructure. Forget about modern tools that manage both the physical and virtual infrastructure, but lack the necessary context required to make an informed decision about an end-user interaction with an application that went horribly wrong. Let Uila do the heavy lifting of not only managing every corner of your physical and virtual infrastructure, but also tying it back to the end-user and application itself, with unparalleled comprehensive management, visibility, and insight.


Topics: Storage Networking Converged Infrastructure Cloud Services & Orchestration