Scale-Out NAS: Driving Value for Rapidly Growing File-based Storage Environments

The massive amount of rich file data generated by richer file formats and Internet Era computing is creating demand for new and innovative scale-out file storage solutions to economically scale bandwidth and performance to heretofore unheard of capacities. "Scale-out NAS" are systems designed from the ground up for economically dynamic scale and for supporting extremely high bandwidth applications. But the enterprise adoption of commercial HPC applications and advances in digital media, combined with advent of Web 2.0, brings the requirement for these types of systems squarely into the data centers of corporate America.

Author(s): Steve Duplessie, Terri McClure

Published: January 9, 2009

New Economic Realities

File Storage Requirements are Changing

The information we store today is very different from the information we stored 30 years ago.  Content capture and creation devices have advanced to enable faster and more efficient business processes-and nowhere has the impact been felt more than in the data center storage domain.  Chip manufacturers are rendering multi-terabyte files.  Oil and gas exploration relies on 3-D models in the hundred terabyte range and healthcare, with high-definition and 4-D imaging, is creating files in the hundreds of megabytes.  It seems no industry is safe from massive file data growth.  Across the board, file formats are richer and file sizes are growing exponentially.  The storage implications are profound.

The emergence of Web 2.0 applications only exacerbates the problem.   In the past few years, nearly every electronic device we use on a daily basis has become a content capture and sharing device. The Web has changed everything and new ways of using it-such as Web 2.0 applications-have driven a new information economy.

We are in the Internet Era of computing. Corporate computing environments, while lagging behind consumer markets, are slowly but steadily moving into the realm of Web 2.0.  Online communities, social networking, new media, collaboration, and other applications are pushing their way into the commercial computing world. Like it or not, IT is going to have to get ready for these rapidly evolving realities of today's business. Tools such as SharePoint, blogs, wikis, streaming media, and a host of other digital content creation and management applications are enabling organizations to redefine themselves in near real-time.

The massive amount of rich file data generated by richer file formats and Internet Era computing is creating demand for new and innovative scale-out file storage solutions to economically scale bandwidth and performance to heretofore unheard of capacities.  "Scale-out NAS" are systems designed from the ground up for economically dynamic scale and for supporting extremely high bandwidth applications.  But the enterprise adoption of commercial HPC applications and advances in digital media, combined with advent of Web 2.0, brings the requirement for these types of systems squarely into the data centers of corporate America.

Users Examine More Cost Effective Storage Approaches

Today, the challenge facing most enterprises is that file data growth is already out of control; the growth of file data has been outpacing e-mail and database-driven growth for quite some time now.  In fact, in a recent ESG survey, more than one in five medium-size businesses cited rapid growth in file-based content as one of their most pressing storage challenges.[1] ESG also estimates that file-based data will account for 70% of total archived capacity by 2012.[2] For commercial enterprises, the faster growth and new file characteristics enabled by advances in content capture devices, combined with the emergence of Internet Era data, only worsens the problem.  As such, ESG expects customers to continue to make new NAS purchases to accommodate this growth and to drive capital and operational cost efficiencies by consolidating sprawling file servers.

But consolidation and richer file data are not the only issues driving users to examine scale-out NAS solutions.  The global financial crisis has driven IT to examine every new purchase with an increased focus on finding opportunities to reduce both capital and operational expenses.  Technologies that reduce overall storage requirements or that drive higher levels of resource utilization are seeing a significant uptick in interest-and traditional ways of doing business are being re-examined to find ways to drive better efficiencies.

Scale-Out NAS Makes Economic Sense

Before understanding why scale-out NAS offers compelling economics, users first need to understand where traditional monolithic-or scale-up solutions-fall short.  Multi-dimensional scale is a core requirement of rich file-based storage architectures as well as other applications with similar requirements.  Scale-out, the ability to independently scale and tune bandwidth, processing, and storage capacity on the fly-all while managing the file system and single global namespace-is becoming the new backbone of file-based storage solutions.

Scale-out storage architectures are significantly different than the monolithic, scale-up storage architectures (e.g., traditional NAS or SAN systems) that were developed to meet distributed computing needs.

Why Scale-Up Doesn't Make the Rich Media Cut

Scale-up storage is just what it sounds like; it is designed to be monolithic, where lots of storage sits behind one or two file server heads, and is designed to scale into the multi-TB range behind those file server heads.  Once the limit on storage is hit, a new monolithic system is installed; a new frame, controllers, and power supplies need to be powered up; and a new file system needs to be managed, even if there is only the need to add minimal incremental storage capacity.  There is no way to balance capacity and workload between systems, and migrating directories or files means remapping and remounting for each and every client with access.  Those that have been through it know the pain of the process; it can be excruciating in a large enterprise environment with lots of clients and zero tolerance for downtime.

Traditional scale-up systems have no economical way to independently scale performance without some significant price penalty along both capital and operational budget lines.  Performance in today's monolithic systems is often scaled by adding a storage rack and more spindles and then striping files across those spindles, increasing throughput and reducing latency, and, as a byproduct, reducing storage utilization.  This is an expensive proposition for serving large sequential files, which not only raises capital costs, but increases operational costs to provide enough floor space, power, and cooling, as well as the additional labor costs associated with managing and load balancing across spindles.  Additionally, with the poor utilization rates resulting from this type of implementation, more systems need to be deployed and managed, resulting in an even greater negative impact on the operating budget.

Scaling to meet the large capacity and high bandwidth performance demands of rich file data and Web 2.0 computing makes the cost exposure of using scale-up systems even more dramatic.  Using scale-up systems in this type of environment means adding potentially hundreds of systems-all managed, provisioned, and tuned individually-in addition to the greater power, cooling, and floor space required.  Scale-up systems are simply the wrong tool for the job; using scale-up systems for Web 2.0 is like using a can opener on a bottle of wine-you can probably find  a way to make it work, but the aftermath is going to be messy and will take time to clean up.

High performance computing, life sciences, healthcare, oil and gas, Web 2.0, computer-assisted design and manufacture (CAD/CAM), and media and entertainment all share similar rich media and single writer/multi-reader characteristics.  Since traditional scale-up file servers were designed to meet the more transaction-oriented, small file nature of distributed computing, scale-up architectures often fall short in meeting performance requirements in these vertical markets, hence the early adoption of scale-out systems there. [3]

The Economics of Scale-Out NAS

Scale-out NAS not only meets rich media performance requirements, it does so cost efficiently. With independent scaling of storage capacity, processors, and bandwidth, users can grow when and as needed, without buying racks and power supplies in advance of capacity or buying extra spindles to stripe files across.  Consequently, scale-out NAS provides "just-in-time" scalability.  And with most scale-out systems, many low level storage management tasks are automated, such as expanding the file system when new physical capacity is added and load balancing performance across processors, significantly reducing management costs.

Adding processing power independently, as can be done with scale-out systems, saves more than floor and rack space. In addition to getting better performance, it significantly reduces power consumption relative to scale-up systems since processors typically use 95% less power than an additional disk shelf consumes.

In scale-out NAS systems, adding capacity and bandwidth-as well as file system expansion-is done online with minimal system performance impact.  This granular scaling capability provides a price/performance advantage as it allows users to start small and scale where needed.  And, since scale-out systems scale into the multi-petabyte range and are managed as a single entity under a global namespace, the systems can meet most users' needs without paying the management penalty associated with deploying tens or hundreds of scale-up systems.

For users evaluating new NAS solutions, initial cost has become a higher priority than the advanced features and functions of scale-out NAS systems, though scale-out systems provide cost advantages that compound over time.  ESG recently conducted a survey of 504 North American and Western European IT professionals to assess data storage environments, including the adoption of scale-out NAS. Market drivers for early adopters included faster provisioning, improved scalability and performance, easier management, and the need to support specific, fast-growing applications. Lower cost of infrastructure was literally last on the list of buying criteria. However, planned and potential users have vaulted lower cost into the top tier of purchasing criteria, second only to improved scalability, which is the crux of the technology (see Figure 1).

Figure 1. Market Drivers for Scale-Out NAS Solutions

SolidStateF1

The trend towards focusing on acquisition price is not surprising in the current economic climate, but users should beware of sweet deals on acquisition price and focus on cost of ownership; initial price is only a small portion of the overall cost to own a system.  Users need to consider the true cost of ownership over time.   Scale-out NAS architectures have a number of cost advantages over scale-up solutions, ranging from start up costs to managing technology refreshes-and most of the steps in between.  Scale-out NAS carries a significantly lower infrastructure cost compared to scale-up systems for a number of reasons:

  • Low entry cost: The entry cost for scale-out systems varies depending on the minimum configurations supported.  Most systems start as small as two nodes and scale out from there.  A clustered storage system ultimately provides parallelism that can scale from small to massive.  Enterprise-class scale-up storage systems also offer massive parallelism, but you have to buy a big system and fill it with disk drives over time-powering, cooling, and taking up floor tiles well ahead of putting additional capacity online.  You have to make a heavy investment at the onset, projecting what you will need for the next three to five years.  With clustered scale-out systems, you can add resources as needed.
  • Riding the commodity curve: Scale-out NAS systems typically use low-cost, high capacity, commodity SATA disk drives.  Because of their multidimensional scale and load balancing capability, the slower performance of SATA drives relative to Fibre Channel drives can be mitigated and is entirely suitable for the markets we've discussed, like HPC, life sciences, and Web 2.0.  The same cost advantages are typically found on the NAS head, where commodity processors are used.  Because of the granular scalability, users don't need to buy frames or processors far ahead of disks themselves, so users typically get better pricing as Intel processors and disk prices decline in cost over time.  Riding the Intel and high capacity disk commodity curve can add up to significant cost savings, especially at the scale seen in these types of environments.
  • Just-in-time scalability: As previously stated, because of the modular nature of scale-out systems, there is no need to buy (and power or cool) frames, power supplies, and mostly empty cabinets in advance of storage capacity.  Putting frames on the floor that will be filled over time adds to labor costs to manage and maintain the systems as well.
  • Higher utilization rates: Better utilization means deferred purchases of new capacity.  Since all of the NAS heads in scale-out systems can address the entire pool of useable capacity in the cluster, there is no capacity locked away behind underutilized NAS heads-a common problem in scale-up systems.  It is not unusual to see utilization rates of 30% or less in scale-up systems and 60% or more in scale-out systems.  Some scale out vendors report utilization rates greater than 80%.

Relative to scale-up systems, operational savings can be achieved over time with scale-out systems thanks to:

  • Reduced change management planning cycles. When one file can be multiple terabytes in size, conventional three or six month change management planning cycles are no longer effective.  Requirements are unpredictable and time-to-provision is more important than ever.  The modular and easily scalable characteristics of scale-out NAS allow for extremely fast provisioning.  The lengthy change management and provisioning process required for monolithic systems just isn't fast enough to respond to today's rich media demands.  Organizational agility demands that change management cycles be reduced-and scale-out NAS allows that to happen.
  • Non-disruptive technology refresh.  With most scale-out systems, the process of managing technology refreshes is faster and easier than with monolithic NAS.  In a clustered NAS architecture, everything is redundant-the data paths, NAS heads, and the data itself.  Several scale-out vendors provide both forward and backward compatibility with new versions of hardware, firmware, and software, so new versions can co-exist in the same system as older versions.  This provides users with the ability to do rolling upgrades, plugging new nodes into the system and unplugging nodes when they need to be retired.  Each scale-out vendor does this slightly differently and it is not a process that should be undertaken lightly, but it is a vast improvement over the lengthy process required to migrate terabytes of data off of a monolithic system and onto a new one.
  • Ability to scale capacity without scaling headcount. Essentially, it should be just as easy to manage a clustered storage system with 100 nodes as it is to manage one with two nodes.  Scale-out NAS systems enable this through a global namespace.  This is a simple concept that is extremely difficult to achieve.  In layman's terms, a global namespace is a virtual representation of a group of disparate physical file systems.  It sits between clients and the assorted file servers in a given environment and adds a layer of abstraction that divorces what the client sees as mount points from the physical server mount points.  It is a map that translates the virtual mount points to physical file servers and presents users with one consolidated view of the file server ecosystem.  It is the secret sauce that enables a single point of management and non-disruptive data migration.  Regardless of how big the cluster gets, it should still remain a single logical system to manage. The ease of management over the lifecycle of the storage system is even more valuable than scalable performance.
  • Automated, policy-based management. Removing the need for human intervention in low-level storage management functions is another way that scale-out NAS reduces management cost.  Most scale-out file storage systems support deep levels of policy-based self management and healing.  Most systems are plug-and-play-add a storage or processor node, and the system self-discovers and expands the file system or incorporates it into load balancing algorithms on the fly.  There is typically no disruption of service and no requirement to plan data layouts, create LUNS, or migrate data. Many of these products are newer to the market and have been designed from the ground up to automate storage management processes.  Scale-out systems typically absorb new processor, bandwidth, and storage capacity, then automatically re-balance and optimize across the newly added resources-with little or no human intervention.  This is significantly different than managing these functions in scale-up NAS systems.  Most scale-up systems have hot spot reporting and some have load balancing across drives within RAID groups, paths, or host bus adapters.  Some of the load balancing is manual, some automated, but scale-up systems do not have the capability to automatically balance loads across NAS heads-not without adding a virtualization appliance to mask the move from clients.  For scale-up systems, it is a fully manual process to balance workloads across NAS systems-one that takes significant time and effort to migrate file systems and directories and remap mount points.

Based on the compelling economic benefits of deploying scale-out NAS solutions, it's no surprise that recent ESG research indicates that users are applying scale-out NAS systems to new use cases.  While most scale-out systems are tuned to perform well for high bandwidth applications, some can also be tuned to support the smaller transaction-oriented file serving requirements of today's distributed computing environments.  In fact, 43% of scale-out NAS users surveyed by ESG indicated that the technology is used to support database and OLTP transactions.  Further proof that scale-out NAS is increasing its footprint in the general storage space is that even though only 11% of those surveyed indicated they use scale-out NAS systems today, 40% indicated they plan to deploy it within the next 12 months and, while another 37% have no immediate plans to deploy scale-out NAS solutions, they are investigating the technology (see Figure 2).

Figure 2. Scale-Out NAS Adoption

SolidStateF2

Summary

Facing the stark reality of a prolonged economic slowdown, users are looking for a number of critical qualities in a storage vendor.  They want vendors they can trust, vendors that market proven solutions and products that offer real value in terms of cost savings and enabling business agility.  Keeping up with data growth driven by new types of applications, richer media types, and the ubiquity of content capture devices requires a new approach to keeping storage costs in check.  New rich media content is being created for everything from research and development, to training, to marketing, and is becoming a mandatory component of everyday business.  Whether it's blogs, video, or HD imaging, content is easier than ever to create-and management will become harder than ever without significant changes.

Enterprises that deploy scale-out NAS solutions can get more value, dollar-for-dollar, from their infrastructure investments.  Scale-out NAS has a compelling value proposition relative to scale-up systems. Its lower infrastructure costs, power efficiency, and management efficiencies should put scale-out solutions on the short list for anyone deploying new NAS capacity.

There are certain trade-offs to be considered.  Not all applications require scale-out solutions-there is still plenty of room for traditional big-iron monolithic NAS systems.  Matching application performance profiles and business requirements to the proper storage platform is important, but IT managers have an opportunity to realize significant savings by deploying scale-out NAS solutions, including lower management costs, right-sizing, and scaling only in the required dimensions: capacity, processing, and/or bandwidth.  The operational savings associated with just-in-time scale-reduced power, cooling, and floor space requirements; reduced storage management headcount; and faster response to provisioning fire drills-can all add up to a more efficient and agile enterprise.


 

[1] Source: ESG Research Report, Medium-Size Business Server and Storage Priorities, June 2008.

[2] Source: ESG Research Report, Digital Archiving: End-User Survey & Market Forecast 2006 - 2010, January 2006.

[3] For more information on the functional differences between scale-up and scale-out NAS systems, see: ESG Report, Next Generation File Storage, July 2008.

NEWSLETTER

Enter your email address, and click subscribe

Subscribe