Validation

ESG Technical Validation: Dell EMC Ready Solutions for AI: Deep Learning with Intel

Introduction

This ESG Technical Validation documents evaluation of Dell EMC Ready Solutions for AI: Deep Learning with Intel. We focused on understanding the performance, ease of use, and total cost of ownership (TCO) of the solution. To validate the full stack performance, we measured the number of tokens per second processed when training the Token2Token Big Transformer model and evaluated how Nauta, an open source initiative by Intel, accelerates deep learning model training. We also compared how Nauta simplifies the deep learning training process, and how the TCO of Deep Learning with Intel compares to running the same tasks on a leading public cloud AI service.

Background

As a result of increased computing power and density, specialized artificial Intelligence (AI) processors, and new algorithms, machine learning (ML) and deep learning (DL) have moved from proof of concept directly into the enterprise, where many organizations are deploying AI programs. According to ESG research, 59% of respondents expected their spending on AI/ML to increase in 2019, while 31% of organizations indicated that leveraging AI/ML in their IT products and services was one of the areas of data center modernization in which they expected to make the most significant investments in the next 12-18 months.1

Organizations looking to leverage the power of AI face significant challenges. Thirty-five percent of respondents to an ESG survey cited the cost of the IT infrastructure as their biggest challenge, while 29% cited the capabilities of the IT infrastructure, and 21% cited the application development environment (see Figure 1).2

What are the drivers behind these challenges? Deep learning is multifaceted and complicated, and operationalizing AI is a difficult and multifaceted problem requiring trained and experienced personnel. Yet there is a shortage of these critical skills.

The effort to operationalize AI is complicated by the need to increase the accuracy of models. Bigger data sets, more hyperparameter tuning, and more complex AI algorithms translates to bigger, faster, more intricate, and more costly infrastructures. Thus, the allure of public cloud infrastructures which provide low startup costs and services categorized as operating expenses. What is needed is an on-premises infrastructure stack that provides performance and scalability for even the largest and most complex AI models while simplifying startup and deployment at a total cost of ownership comparable to or less than public cloud services.

Deep Learning with Intel

Deep Learning with Intel is part of Dell EMC’s Ready Solutions for AI, a set of standardized infrastructure stacks for machine learning and deep learning designed to accelerate time to value.

Deep Learning with Intel is a scale-out cluster consisting of a single Dell EMC PowerEdge R740xd master/login node and 16 Dell EMC PowerEdge C6420 dense compute servers in four C6000 chassis. Each compute node is interconnected with both a 1 gigabit Ethernet connection for access to the outside network and Internet and a 10 gigabit Ethernet connection for internal traffic and data movement. An optional Dell EMC Isilon H600 storage cluster can be attached to the internal 10 gigabit Ethernet network by way of 40 gigabit Ethernet connections. Details of the system configuration are provided in Table 6 in the Appendix.

Deep Learning with Intel comes with deployment services to accelerate time to results and single contact support for the complete hardware and software stack. The validated hardware and software stacks combine Dell EMC PowerEdge servers, Dell EMC Isilon storage, high-speed networking, data science software, and AI libraries and frameworks into preconfigured, scalable, balanced systems.

Dell EMC has integrated and validated Intel’s open source initiative, Nauta, into the solution. Nauta is a distributed deep learning platform that leverages Kubernetes and Docker technologies to provide a multi-user distributed DL training and testing computing environment. Nauta both simplifies the DL training workflow and leverages the power of Kubernetes for automating deployment, scaling, and management of containerized3 applications. In addition to a command line interface (CLI), Nauta includes both a web-based graphical user interface (GUI) and TensorBoard integration, a suite of visualization tools for deep learning. These interfaces streamline and accelerate the data scientist’s experiment management workload. Nauta supports both batch and streaming inference for integrated model validation, and includes customizable model templates, simplifying model creation and training experiments.

Organizations deploying Deep Learning with Intel will benefit from:

  • Fast deployment—Rather than forcing the organization to select, configure, integrate, and tune components into an AI stack, Deep Learning with Intel is a validated system deployed by Dell EMC services, shrinking the time to deploy an AI environment from months to weeks while reducing skillset requirements and operational risk.
  • Simplified configuration—Deep Learning with Intel is preconfigured with Nauta along with the TensorFlow distribution and includes the requisite deep learning supporting libraries optimized for Intel Xeon Scalable Processors.
  •  
  • Optimized use of shared resources—The integrated Nauta platform leverages Kubernetes, enabling automated orchestration of workflows allowing many experiments to be scheduled and run “hands free.” Thus, multiple data scientists can share the same AI infrastructure stack with minimal impact on system performance.
  • Rapid scalability—All Dell EMC Ready Solutions for AI are designed for rapid scalability. Organizations can increase compute power by adding compute nodes to the cluster with just a few mouse clicks. Isilon storage can be scaled out by non-disruptively adding additional nodes, which linearly increases storage performance, or by scaling storage on the optional Isilon scale-out storage cluster.

ESG Technical Validation

ESG performed evaluation and testing of Deep Learning with Intel at the Dell EMC HPC & AI Innovation Lab. Testing was designed to quantify the performance and scalability of the solution when training deep learning models. Also of interest were understanding the TCO advantages and how Nauta simplifies the model development workflow.

Accelerating AI Model Development

ESG used Google’s Tensor2Tensor (T2T), an open source library of deep learning models and datasets, to characterize the performance of Deep Learning with Intel. From the many DL problems included in T2T, we focused on language translation, using the Big Transformer model to train an English to German neural machine translator (NMT). The dataset contained 4.5 million sentence pairs, and performance was measured in tokens per second processed, where a token is a word part.

We measured the performance of T2T on two configurations. The first configuration consisted of the T2T model training running on a bare metal Intel hardware stack without the use of containers. The second configuration used the full Deep Learning with Intel infrastructure stack, and leveraged Nauta to run the T2T model training with Kubernetes and containers (containerized). For each configuration, we ran the training multiple times, using one, two, four, eight, and 16 compute nodes. The results are shown in Figure 2 and Table 1.

What the Numbers Mean

  • In every test case, using containers for deep learning imposed no performance penalty.
  • In most cases, leveraging containers results in a modest performance increase. This is most likely the result of minimizing kernel context switching and Kubernetes’ efficient scheduling algorithms.

Infrastructure Scaling

ESG used the training results to understand the scaling performance of Deep Learning with Intel. Figure 3 plots the Nauta containerized results against an extrapolation of linear scaling, using the one compute node configuration as the base performance for the system. The plot uses exponential axes, where each division represents a doubling of value from the previous division. The results are also detailed in Table 2

What the Numbers Mean

  • Deep Learning with Intel, leveraging Nauta, scaled nearly linearly, achieving 80% or more of theoretical maximum throughput as the number of compute nodes was increased by a factor of 16.

Why This Matters

Performance and scalability are key concerns for data scientists training AI models. Faster solutions enable data scientists to test with bigger data sets and experiment with more hyperparameter combinations that may produce more accurate models and faster convergence for production-ready AI models.

ESG validated that Deep Learning with Intel gains performance by leveraging containerization and orchestration. Model training ran up to 18% faster with Nauta, and the solution processed 16,960 tokens per second with 16 compute nodes running containerized learners in parallel.

Performance of Deep Learning with Intel scaled nearly linearly, with the solution achieving 80% or more of theoretical maximum throughput as training was scaled from one to 16 compute nodes.


Improving AI Program TCO

ESG evaluated the total cost of ownership (TCO) of Deep Learning with Intel. We also evaluated the TCO for performing the same AI workloads using a leading public cloud AI service. For TCO comparisons, we modeled two scenarios: deep learning model training and deep learning inferencing. However, Deep Learning with Intel is not well suited for inferencing—the inference latency will be larger than desired for most production environments. The solution can be used to test and validate model inferencing. We also compared the TCO for on-premises infrastructures using only CPUs and on-premises infrastructures using GPU accelerators.

The TCO model for Deep Learning with Intel used commonly available street pricing, and modeled hardware, software, services, IT management, power, and cooling costs. Hardware costs included server chassis, racks, cabling, network switches, and network cabling. Software costs included the yearly licensing fees for all installed licensed software. The solution includes professional services that encompass installation and configuration of the solution on-premises. We estimated the cost of IT management, power, and cooling to be 30% of the cost of the servers and network switches, and the systems were assumed to be running 24 hours per day, seven days per week.

The TCO model for running the same workload on a leading public cloud service used the published price list in March 2019. The model includes the cost for compute time, static data storage, data transfer in and out of the cloud, and a direct network connection into the public cloud data center. Compute time was modeled at 12 hours per day, seven days per week. Data storage was modeled assuming 10 TB of static storage, 10 TB ingress per month, and 10 TB egress per month.

Model Training

We first evaluated the three-year TCO for deep learning model training. The model for Deep Learning with Intel modeled a 16 compute node configuration.

The public cloud service separates the costs for running Jupyter notebooks from the costs of training the model. We modeled ten notebook instances to support the simultaneous work of ten data scientists. We modeled 80 training instances, where each instance has eight virtual CPUs (roughly equivalent to a physical CPU core), to match the Deep Learning with Intel’s 640 CPU cores. Table 3 and Figure 4 compare the three-year TCO for training between Deep Learning with Intel and the public cloud service.

What the Numbers Mean

  • For deep learning training, the three-year TCO for Deep Learning with Intel is $238,000 less than running the same workload on a leading public cloud AI service.
  • For deep learning training, the three-year TCO for Deep Learning with Intel is 24% less while providing services for twice as many data scientists with double the compute time (24 hours per day versus 12 hours per day) and ten times the storage capacity (100TB versus 10TB).

Inferencing

Next, we modeled the three-year TCO for deep learning inferencing, where scientists use the deep learning model with new data to make inferences. The TCO calculations did not include latency and scale considerations that render Deep Learning with Intel unsuitable for inferencing applications.

The TCO model for Deep Learning with Intel modeled a four compute node configuration and included optional hardware co-processors dedicated to accelerating the inferencing computation.

As with the training scenario, our three-year TCO model for the public cloud service models ten notebook instances to support the simultaneous work of ten data scientists. The public cloud service does not publish CPU core counts for its inferencing service, instead publishing floating point math performance. We modeled 80 inference instances, where each instance has 2 GB of memory, to maintain compute power parity with Deep Learning with Intel’s 160 CPU cores. Table 4 and Figure 5 compare the three-year TCO for inferencing between Deep Learning with Intel and the public cloud service.

What the Numbers Mean

    • For deep learning inferencing, the three-year TCO for Deep Learning with Intel is almost $61,000 less than running the same workload on a leading public cloud AI service.
    • For deep learning inferencing, the three-year TCO for Deep Learning with Intel is 13% less while providing double the compute time (24 hours per day versus 12 hours per day), and 10 times the storage capacity (100TB versus 10TB).

On-premises Accelerated Deep Learning

Lastly, we compared the three-year TCO of Deep Learning with Intel to a GPU-accelerated configuration of a comparable stack as Deep Learning with Intel. This enabled us to understand the TCO impacts of substituting GPUs for CPUs in on-premises deep learning environments. Based on information from Intel and empirical evidence, we configured three times as many CPUs as GPUs in order to maintain compute power parity between the two configurations. We modeled a 12-compute node CPU configuration and four compute node with GPU accelerators configuration. Table 5 and Figure 6 compare the three-year TCO for the two configurations.

What the Numbers Mean

        • Deploying GPU accelerators for Deep Learning imposes an additional $295,000 cost (34% more) for comparable performance.

Why This Matters

According to ESG research, the cost of infrastructure is surveyed organizations’ most often cited AI/ML challenge.4 Thus, it’s no surprise that public cloud AI services are appealing, as they come with low startup costs, and services are categorized as operating expenses.

ESG validated that the three-year TCO for Deep Learning with Intel, an on-premises solution, is significantly cheaper than utilizing public cloud AI services. For deep learning model development, Deep Learning with Intel provides a 24% cost savings, more than $238,000. For deep learning inferencing, the Dell EMC on-premises solution provides 13% cost savings, almost $61,000.

Public cloud AI service costs can vary widely, and monthly charges can be surprisingly high when inadvertent mistakes lead to runaway processes that consume excessive expensive CPU time or generate massive volumes of data. Data scientists’ experimentation can consume more compute time than originally anticipated, increasing costs and breaking budget assumptions. Conversely, the Deep Learning with Intel on-premises solution provides managers and financial accountants with known and predictable expenses.


Simplifying Ease of Use

ESG evaluated how Deep Learning with Intel simplified deployment of the AI infrastructure stack and accelerated time to results for the data scientist. AI infrastructure stacks are complex, comprising a hardware stack with massive compute power, storage, and networking, and a software stack that integrates a combination of open source and licensed software. Selecting, integrating, and tuning the proper components into a complete AI solution requires both AI expertise and systems and integration expertise.

Deep Learning with Intel provides all necessary software, compute, storage, and networking hardware, and the solution includes onsite installation and configuration by Dell EMC professional services. Thus, IT and data scientists can skip the time consuming and convoluted effort of installing and configuring operating systems, AI libraries, orchestration, and management software, saving weeks to months of effort.

Deep Learning with Intel comes integrated with the Nauta platform that simplifies the process for the data scientist to get started with deep learning. As shown in Figure 7, rather than the multi-step process used in traditional AI infrastructures, when using Nauta, the data scientist logs in to the system, specifies template parameters, and submits the deep learning training job. Nauta, leveraging Kubernetes for automation and orchestration, executes the training job, collects the output, and provides the results to the user in the Nauta GUI or with TensorBoard. The visualization tools make it easier for the data scientist to interpret the output and refine the deep learning model.

A common task for data scientists is hyperparameter tuning—choosing a set of optimal hyperparameters for the deep learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process. The traditional method of hyperparameter tuning is a parameter sweep, which is an exhaustive search, where every combination of parameters is tested, and the parameter set that generates the best model is chosen. This methodology requires the data scientist to create a job script for each combination, submit the jobs, collect the results, and determine the best combination.

Nauta leverages Kubernetes to automate and orchestrate the arduous hyperparameter tuning effort. The data scientist simply creates a single file containing the desired ranges for each hyperparameter, and then submits the job. Nauta automatically computes the entire set of hyperparameter combinations, runs jobs for each combination, collects the results, and presents the results in TensorBoard. As shown in Figure 8, this enables one-touch hyperparameter tuning, allowing the data scientist to devote her efforts to other tasks.

In traditional AI infrastructures, it takes an average of five minutes to configure a job, and an additional minute to submit the job. A hyperparameter tuning experiment with 300 jobs would require 30 hours of data scientist time and effort.

Using Nauta would take 5 minutes to configure and submit all 300 jobs, freeing the data scientist to work on other tasks. Depending on job requirements, the Dell EMC Ready Solution for AI/DL with Intel can execute up to 16 jobs simultaneously, potentially reducing total run time and improving time to deployment.

Why This Matters

Deep learning is complex and challenging, and the difficulty of developing models is exacerbated by a shortage of experienced or trained staff and the complexity of the infrastructure stack. Data scientists use a large combination of licensed and open source tools that greatly complicate the iterative, cyclical learning process that drives ML. This leads to issues with time to business value. A solution that simplifies infrastructure deployment and automates the DL model development process is needed.

ESG validation revealed that Deep Learning with Intel simplified deployment—Dell EMC professional services will perform the initial deployment and configuration. Nauta simplified and automated the data scientist workload, and enabled unattended hyperparameter tuning. Deep Learning with Intel can reduce the time to run a 300-job hyperparameter tuning experiment from 30 hours to just a few minutes , enabling the data scientist to focus his effort on other nontrivial tasks.


The Bigger Truth

Modern multi-core processors and GPUs have transformed AI from science fiction to reality, and any organization can go beyond just proof of concept to reaping the benefits of operationalized AI programs. According to ESG research, 45% of organizations expect to see value from their AI/ML initiatives in less than six months.5 However, these organizations face significant challenges spanning from the cost and capabilities of the AI infrastructure to poor application development environments and the lack of experienced and skilled staff.

Deep Learning with Intel is a standardized AI infrastructure stack comprising servers, networking, storage, and AI software. The solution includes the Nauta platform to leverage container technology, automation, and orchestration. This validated and integrated hardware and software solution is tuned and optimized for AI initiatives, shortening deployment time, simplifying the data scientists’ workflow and workload, accelerating performance, and improving TCO.

ESG validated that using Nauta improves deep training performance on the Deep Learning with Intel solution. Containerized training workloads ran up to 18% faster than the same workloads on a bare metal system. Further, the solution achieves near-linear scaling, achieving 80% of theoretical maximum performance when the number of compute nodes is scaled from one to 16.

Nauta platform’s orchestration and automation systems simplified model development, significantly reducing the number of steps in the workflow. Nauta’s automation enabled unattended hyperparameter tuning, simplifying the arduous and tedious task and enabling the data scientist to focus her efforts on other nontrivial tasks.

Deep Learning with Intel proved to be more cost effective than running the same workloads in the public cloud. Over three years, a 16 compute node solution was 24% cheaper than a leading public cloud AI service for deep learning training. A 12 compute node solution was 13% cheaper than a leading public cloud AI service for deep learning inference workloads. The three-year TCO for Deep Learning with Intel provided 24 hours per day compute availability and 100 TB of storage capacity compared to only 12 hours per day compute consumption and 10TB storage consumption for the public cloud service.

Organizations seeking an AI/ML infrastructure stack that is easy to use and cost effective and that enables their AI practitioners and data scientists to quickly and easily operationalize AI programs, should investigate how Deep Learning with Intel can simplify and accelerate their AI journey.

Appendix



1. Source: ESG Research Report, 2019 Technology Spending Intentions Survey, February 2019.
2. Source: ESG Master Survey Results, Artificial Intelligence and Machine Learning: Gauging the Value of Infrastructure, March 2019.
3. Containers rely on virtual isolation to deploy and run applications without the overhead of complete virtual machines for each application.
4. Source: ESG Master Survey Results, Artificial Intelligence and Machine Learning: Gauging the Value of Infrastructure, March 2019.
5. ibid.
This ESG Technical Validation was commissioned by Dell EMC and is distributed under license from ESG.


ESG Technical Validations

The goal of ESG Technical Validations is to educate IT professionals about information technology solutions for companies of all types and sizes. ESG Technical Validations are not meant to replace the evaluation process that should be conducted before making purchasing decisions, but rather to provide insight into these emerging technologies. Our objectives are to explore some of the more valuable features and functions of IT solutions, show how they can be used to solve real customer problems, and identify any areas needing improvement. The ESG Validation Team’s expert third-party perspective is based on our own hands-on testing as well as on interviews with customers who use these products in production environments.

Topics: Data Platforms, Analytics, & AI