Assembly and integration of off-the-shelf deep learning (DL) and machine learning (ML) compute, storage, networking, and software components can increase your system’s complexity and lengthen deployment times. As a result, your valuable data science resources are wasted on systems integration work. It’s difficult to achieve predictable and scalable artificial intelligence (AI) performance, and keeping a DL and ML infrastructure up and running requires you to have deep, full-stack AI expertise.
NetApp can help you fully realize the promise of AI by simplifying, accelerating, and integrating your data pipeline with the NetApp® ONTAP® AI proven architecture. ONTAP AI is powered by NVIDIA DGX servers and NetApp cloud-connected all-flash storage.
And it keeps getting better. NetApp ONTAP AI is now supported and has been tested with Mellanox Spectrum Ethernet Switches. With NetApp, you can now deploy the first reference architecture for AI and machine learning (ML) with NVIDIA DGX-1 servers and Mellanox Ethernet switches.
This architecture consists of nine DGX-1 servers with a single NetApp AFF A800 all-flash storage system, and you get perfectly linear performance scalability of up to 72 GPUs. Mellanox Spectrum Ethernet Switches with Onyx OS give your system high bandwidth and low latency. You get full Converged Enhanced Ethernet (CEE) feature support so that you can use RDMA over Converged Ethernet (RoCE) for GPU interconnect.
Based on the validation testing results, the ONTAP AI architecture delivers excellent training and inferencing performance. The results also demonstrate adequate storage headroom to support additional DGX-1 systems that you might want to deploy. You can easily and independently scale your compute and storage resources from half-rack to multirack configurations and achieve predictable performance to meet any of your ML workload requirements. For more details about the testing, read the NetApp Verified Architecture report.