SC19, the premier supercomputing conference, is happening this week (November 17-22) at the Colorado Convention Center in Denver. You can find NetApp at Booth #249. This year, NetApp is highlighting technologies to satisfy both HPC and AI needs, including important collaborations with our partners NVIDIA and ScaleMatrix. The NetApp AI team is on hand to tackle your questions, and we’ll be demonstrating the latest technologies to accelerate HPC and AI workloads.
You’ve probably noticed that many of the recent blogs in this series have focused on the latest happenings and announcements at AI conferences and other industry events. The pace of innovation this year has been extraordinary, and NetApp’s AI team has been extremely busy attending events, meeting with customers, and evangelizing the benefits of NetApp AI technologies spanning from edge to core to cloud.
This blog is coming to you—in near real time—from SC19 in Denver, Colorado, a show where NetApp has maintained a presence for years. So far at this year’s show, I’m struck by the increasing intersection between HPC and AI. In fields ranging from aircraft design to weather forecasting to seismic exploration, researchers and engineers are augmenting traditional HPC approaches with AI to produce deeper insights and faster results.
As I’ve discussed in previous blogs (see my recent blog Infrastructure Design for Autonomous Vehicle Development) your path to AI may depend on where you start from and what your existing skillsets are. Our job at NetApp is to deliver solutions that help you make a more seamless transition to AI regardless of your entry point by removing bottlenecks and enabling data pipelines that deliver critical data where it’s needed at the speed it’s needed.
It’s no surprise that the teams attending SC19 are largely approaching AI from an HPC perspective. Yesterday afternoon (Monday, November 18) NVIDIA CEO Jensen Huang delivered a keynote and introduced two new solutions in collaboration with NetApp and others to help HPC teams succeed with HPC and AI: NVIDIA DGX SuperPOD and NVIDIA Magnum IO.
NVIDIA SuperPOD Brings Supercomputing to the Enterprise
NetApp and NVIDIA have been collaborating closely for a number of years, combining the benefits of NVIDIA expertise in GPU computing and NetApp’s proven solutions for high-performance storage and advanced data management. ONTAP AI—a market-leading joint solution combining NVIDIA DGX-1 and DGX-2 with NetApp All Flash FAS storage—was first introduced in 2018. (I described ONTAP AI in my previous blogs here and here.)
NVIDIA DGX SuperPOD is the newest addition to our joint solutions portfolio, simplifying supercomputing and enabling AI. The DGX SuperPOD is designed to support extreme HPC and AI workloads requiring multi–petaflop-scale compute power, in a systemized solution that can be deployed in just weeks, making HPC-scale technology more accessible to the enterprise. This turnkey solution takes the complexity and guesswork out of HPC and delivers a complete, validated solution stack (including best-in-class compute, switches, networking, and storage) for deployments at scale. Running NVIDIA AI software on the DGX SuperPOD provides a high-performance DL training environment for large scale multi-user AI software development teams.
DGX SuperPOD configurations start at 32 DGX-2 nodes, each with 16 NVIDIA V100 Tensor Core GPUs. Each node delivers 2 petaFLOPS of AI compute power—the equivalent of hundreds of CPU-based servers. DGX SuperPOD reduces physical footprint and power consumption to a fraction of that of a comparable traditional compute cluster.
NetApp complements the DGX-2 compute capabilities with high-performance NetApp EF600 all-flash NVMe storage. NetApp E-Series and EF-Series storage has been highly regarded in the HPC community for more than a decade. The EF600 delivers 2M sustained IOPS, response times under 100 microseconds, 44GBps of throughput, and 99.9999% availability. With industry-leading density, the EF600 is the only end-to-end NVMe system to support 100Gb NVMe over InfiniBand (IB), 100Gb NVMe over RoCE, and 32Gb NVMe over FC, future-proofing your DGX SuperPOD.
To find out more about DGX SuperPOD, be sure and check out the following links:
- NetApp and NVIDIA Collaborate to Simplify AI
- NVIDIA and Partners Bring AI Supercomputing to Enterprises
- DGX SuperPOD video
NVIDIA Magnum IO Eliminates I/O Bottlenecks
NetApp and NVIDIA use GPU and data acceleration technologies to address emerging computing workloads like AI, genomics, ray tracing, analytics, databases, and seismic processing and interpretation. CPUs are increasingly a bottleneck to continued performance improvement on these workloads. RAPIDS technology, introduced earlier this year, bridges the CPU and GPU universes to accelerate the transition to GPUs.
With every performance improvement in one area, however, the bottleneck moves somewhere else. In GPU computing, the CPU typically controls data loading from storage to GPUs, creating I/O bottlenecks, especially for uses cases where real-time data access is critical. NetApp and NVIDIA are working to eliminate this bottleneck and deliver further acceleration for HPC workloads.
NVIDIA Magnum IO is a multi-GPU, multi-node networking and storage I/O optimization stack. APIs integrate compute, networking, distributed file systems, and storage to maximize I/O performance and functionality. Magnum interfaces with CUDA-X™ HPC and AI libraries to accelerate I/O for a broad range of HPC, AI, data analytics, and visualization uses cases.
GPUDirect Storage is a key feature of Magnum IO. It opens a direct data path between GPU memory and storage, eliminating CPU bottlenecks. Taking the CPU out of the I/O pathway increases the demands on storage. With support for RDMA, NetApp EF600 all-flash NVMe storage delivers the performance and reliability required to keep up with the I/O demands of data-hungry GPUs.
To find out more about NVIDIA Magnum IO, be sure and check out the following links:
- NetApp and NVIDIA: Taking High-Performance Computing to the Next Level
- NVIDIA Developer Blog: GPUDirect Storage: A Direct Path Between Storage and GPU Memory
AI Anywhere with ScaleMatrix
Successful AI requires integration of a wide range of hardware and software elements. In addition to our NVIDIA partnership, NetApp is joining forces with a rapidly growing ecosystem of the most innovative AI vendors, offering a wide range of solutions and services to streamline on-premises, cloud, and colo deployments.
Delivering the power density and cooling necessary for the latest high-performance GPUs is one of the challenges that comes with scaling AI projects. NetApp first introduced solutions and hosting services from our colocation partner ScaleMatrix at GTC in San Jose this year, combining ScaleMatrix and Dynamic Density Control (DCC) cabinet technology with the power of ONTAP AI.
Now we are taking the partnership a step further by leveraging edge capabilities and the mobility and modularity of the DDC cabinet tech to deliver ONTAP AI as a plug-and-play solution—orderable with a single SKU—that can be deployed anywhere, providing a self-contained environment, with guaranteed air flow, integrated security, and fire and noise suppression. ScaleMatrix solutions are ideal for AI and other high-performance workloads, including edge inferencing in retail, healthcare, and manufacturing. Suitable for office locations, the solution can be up and running within minutes—you just plug it in and power it on. Systems can be deployed and redeployed to another location with ease. No data center needed.
ScaleMatrix is showcasing its mobile (R-Series) and modular (S-Series) cabinets with ONTAP AI configurations in its booth at SC19. You’ll find ScaleMatrix at booth #2131. Read the ScaleMatrix SC19 blog here. And the press release here.
To find out more about ScaleMatrix and ONTAP AI, visit http://www.scalematrix.com/netapp and check out the following videos:
- AI Anywhere – R-Series mobile cabinet
- AI Anywhere – S-Series modular cabinet with DGX-1
- AI Anywhere – S-Series modular cabinet with DGX-2
More Information and Resources
NetApp AI and NetApp data fabric technologies and services can jumpstart your company on the path to success. Visit us at booth #249 at SC19, or check out the following links to learn more:
And check out these resources to learn about ONTAP AI:
- Solution Brief: NetApp ONTAP AI
- White Paper: Edge to Core to Cloud Architecture for AI
- White Paper: Building a Data Pipeline for Deep Learning
- NetApp Validated Architecture: ONTAP AI – NVIDIA DGX-2 POD with NetApp AFF A800
- NetApp Validated Architecture: NetApp ONTAP AI with Mellanox Spectrum Switches
- Technical Report: AI at Scale with Trident, Kubernetes, and Kubeflow
- Customer Story: Cambridge Consultants Uses AI for World-Changing Innovation
And don’t miss our recent series about AI in the automotive industry: