streamline AI workflows with NetApp data science toolkitTo achieve business success with AI, you need rapid experimentation. Although the emerging world of machine learning operations (MLOps) offers many tools for iterative model training and deployment, most of them don’t streamline data management. And enterprise-caliber storage and data management platforms are often complex and unapproachable for the data scientists and data engineers who work on AI projects.

 

To fill this gap, we’ve developed the NetApp® Data Science Toolkit. This toolkit provides NetApp industry-leading, multitenant data management capabilities in a simple, easy-to-use interface that’s designed for data scientists and data engineers. Using the familiar form of a Python program, the toolkit enables data scientists and engineers to provision and destroy data volumes in seconds. Because it also provides easy access to advanced storage features that would normally require help from a storage administrator, the toolkit delivers real business value by significantly speeding up projects.

A quicker, easier AI workflow

With the NetApp Data Science Toolkit, a data scientist can almost instantaneously create a data volume that’s an exact copy of an existing volume, even if the existing volume contains terabytes or even petabytes of data. Data scientists can quickly create clones of datasets that they can reformat, normalize, and manipulate, while preserving the original “gold-source” dataset. Under the hood, these operations use highly efficient and battle-tested NetApp cloning technology, but they can be performed by a data scientist without storage expertise. What used to take days or weeks (and the assistance of a storage administrator) now takes seconds.

 

AI workflow for a Data scientist

Data scientists can also save a space-efficient, read-only copy of an existing data volume. Based on the famed NetApp Snapshot technology, this functionality can be used to version datasets and implement dataset-to-model traceability. In regulated industries, traceability is a baseline requirement, and implementing it is extremely complicated with most other tools. Now, with the Data Science Toolkit, it’s quick and easy.

 

AI training host infographic

Data Science Toolkit versus AI Control Plane

The NetApp AI Control Plane is a full-stack solution that pairs popular open-source MLOps tools with NetApp technology so that you can rapidly manage AI data and experimentation. The Data Science Toolkit enhances this solution by making it much easier to manage data. A data scientist working within a Jupyter Notebook that was provisioned using the AI Control Plane can use the Data Science Toolkit to implement a data management task in one simple line of Python code. Likewise, a data engineer can easily run a Data Science Toolkit operation as a step within an Apache Airflow or Kubeflow Pipelines automated workflow.

 

NetApp Data Science Toolkit

You can also use the Data Science Toolkit to integrate advanced NetApp data management capabilities into other MLOps platforms, including custom and homegrown platforms.

 

Alternatively, the Data Science Toolkit can serve as an easy-to-use, simple-to-manage standalone solution for smaller teams or teams that don’t need the overhead of a full-blown MLOps platform. The toolkit is compatible with NetApp Cloud Volumes ONTAP® software, so teams can use on-demand cloud compute resources in AWS, Microsoft Azure, or Google Cloud.

 

With the NetApp Data Science Toolkit, data management is no longer an impediment to a fast, streamlined AI process. To learn more, visit the toolkit’s GitHub repository.

Mike Oglesby

Mike is a Technical Marketing Engineer at NetApp focused on MLOps and Data Pipeline solutions. He architects and validates full-stack AI/ML/DL data and experiment management solutions that span a hybrid cloud. Mike has a DevOps background and a strong knowledge of DevOps processes and tools. Prior to joining NetApp, Mike worked on a line of business application development team at a large global financial services company. Outside of work, Mike loves to travel. One of his passions is experiencing other places and cultures through their food.

Add comment