As businesses in almost every sector explore new ways to achieve competitive advantage, Artificial Intelligence and Machine Learning are becoming critical elements in reaching those goals. In addition to extracting the most value possible from existing products and services, these technologies are enabling the development of previously inconceivable capabilities. These next-generation applications require a new breed of infrastructure and management capabilities for development and deployment that emphasizes flexibility and performance while still delivering the reliability and stability that enterprise IT operations demand.

ONTAP AI

ONTAP AI is a complete infrastructure stack for AI/ML workloads that delivers world-class performance along with NetApp’s exclusive cloud-connected data management capabilities. NetApp A800 NVMe storage systems provide up to 25GB/s read , and can be clustered with up to 12 systems to deliver as much as 300GB/s to the NVIDIA DGX-1 GPU servers. With FlexGroup volume technology, the A800 cluster can present up to 75PB in a single namespace with the performance potential of the entire cluster. Cisco Nexus 3232C 100Gb Ethernet switches provide both GPU cluster interconnect networking using RDMA over Converged Ethernet (RoCE) as well as storage access using NFS.

Automating with Ansible

Ansible is a configuration management tool from Red Hat that is quickly becoming the standard for DevOps-style system administration, and NetApp is one of only 6 vendors certified to deliver modules in the official Ansible distribution. Ansible provides a declarative methodology for management of hardware and software where the administrator specifies the intended state of the configuration in a set of easy to read YAML files. This allows the state of the infrastructure to be managed with version controls and change validation processes just like any other software code.

 

Ansible uses a concept called ‘Roles’ to provide a standard set of tasks that can be executed multiple times or on multiple devices simply by changing some variables. For this demonstration I’ve created 3 separate roles  – one for storage on the A800, one for networking on the Nexus switches, and one for the DGX-1 servers. Each role contains the tasks specific to that device, along with any default variables or other files needed to complete the configuration.  These three roles combined perform all of the required configuration steps defined in the ONTAP AI NVA-1121 Deployment Guide. Site-specific details are supplied in a variables file associated with each role.

 

In the video below I’ll demonstrate how quickly Ansible can deploy the ONTAP AI configuration including network, storage and compute. There are over 70 specific configuration tasks across the 3 components, with some of those tasks repeated multiple times to produce the desired configuration. Check it out.

 

 

As you can see in the video, the entire configuration process takes less time than it takes to run to the coffee shop.  After the initial deployment Ansible provides fully automated configuration management, enabling consistent application of patches and updates and version-controlled changes to the running configuration of the infrastructure. Ansible can even be integrated into higher-level service portals, enabling self-service provisioning of physical and virtual infrastructure resources.

 

While we always strive for simplicity in design, a solution like ONTAP AI and the workloads it supports is by necessity more complicated than general-purpose IT infrastructure. Ansible eliminates the administrative cost of that complexity, reducing the time needed to start delivering value and improving the overall stability of the environment. Together ONTAP AI and Ansible provide a next-generation application development platform with cutting-edge performance, flexibility and reliability that enables any business to harness the power of their data with Artificial Intelligence and Machine Learning technologies.

 

Please visit netapp.com/ai for more information about NetApp AI solutions, including ONTAP AI and the edge to core to cloud data pipeline enabled by NetApp storage systems.For more information about using Ansible in general and with NetApp products in particular you can check out netapp.io, the home for all of NetApp’s DevOps solutions. In particular, these articles by my colleague David Blackwell are a great introduction to using Ansible with NetApp.

David Arnette

David Arnette is a Sr. Technical Marketing Engineer focused on Netapp infrastructure solutions for Artificial Intelligence and Machine Learning. He has published numerous reference architectures for enterprise applications on the FlexPod converged infrastructure and has over 18 years’ experience designing and implementing datacenter storage and virtualization solutions.