Operationalize Data Science at Scale

The Challenge

Data lake investments are not new. They have been going on for a while, starting with big data and then the major Hadoop adoption curve in 2017. Nevertheless, the main debate today is not whether Hadoop is dead, or whether customers have gotten value out of these systems; it’s the debate about where to go next. Today, the two most frequently asked questions by customers are: 

  • It is time for my Hadoop system refresh. How do we reduce the complexity of our data lake and make it more cloud agnostic and automated?
  • How can we combine data analytics with artificial intelligence, machine learning, and deep learning (AI/ML/DL) tools in one platform?

 

As usual in technology, there is no single answer to these two questions. However, the increased adoption of Kubernetes among the data science community and the rise of microservices-based Hadoop data analytics tools are signaling a possible solution that addresses both questions. All you need is a solution that enables the full data analytics and AI/ML/DL lifecycle on one ready-to-use platform. This solution must help you keep up with your traditional data analytics tools, yet still enjoy modern AI microservices. Such a solution allows your data scientists, data engineers, DevOps staff, and app developers to collaborate efficiently on one secure self-service platform. Finally, this solution needs to give you a cloudlike experience on premises and allow you to connect to your cloud AI and analytics services. If you haven’t figured it out yet, we are talking about switching to Kubernetes for your big data and AI layer orchestration.

 

The Solution

To glue all these services together and provide a streamlined integration of big data analytics tools into your AI pipeline, NetApp is partnering with Iguazio. Iguazio provides a self-service platform that you can deploy in the cloud or on premises and that includes cloudlike data services, serverless functions, and popular big data and AI microservices. Using a Kubernetes-based managed platform connected to NetApp® systems through the Trident plug-in enables developers to focus on building applications instead of tedious integrations, infrastructure management, and operations.

 

Data Science Model from Netapp and Iguazio

 

This solution combines the integration of NetApp Trident with Kubernetes with Iguazio’s open-source Nuclio, which automates the data science pipeline with serverless functions. A developer can now submit code and requirements, and the serverless platform automates the deployment, scaling, and management of the application. Moreover, NetApp’s presence in the major hyperscalers enables customers to enjoy a true hybrid-cloud-enabled AI and data analytics experience. They can use their preferred big data tools such as Apache Spark or Presto as well as data-science and AI tools such as Jupyter, TensorFlow, or PyTorch on premises, in the cloud, or in a hybrid model.

Take Action Today

In conclusion, Hadoop environments can be expensive, complex, and difficult to scale. They often deliver poor utilization, which means that you might be paying for more compute resources than you use, or you might waste storage on duplicate data. Additionally, Hadoop-based extract, transform, load (ETL) tools are not always the best fit for data science and machine-learning pipelines. Still, there are customers out there who need help with their existing Hadoop investments. Such customers will want to transition into a simpler, cloud-friendly, automated platform that preserves the traditional Hadoop analytics tools, while allowing them to add open-source AI pipeline automation tools such as Kubeflow. Iguazio offers native integration with NetApp® solutions to enable building highly available end-to-end ML pipelines that can process and react to massive amounts of data. In particular, NetApp StorageGRID®, combined with NetApp® ONTAP AI, brings in the power of Amazon S3–connected AI tools and frameworks, further empowering Iguazio’s capabilities and enriching ecosystem tools.

 

Finally, from our own in-house experience and through our interactions with installed base customers who have large Hadoop clusters, we understand that this transition might not be easy and quick. However, we are seeing a great appetite for a gradual approach toward this end-to-end platform. This approach is another reason why customers are using the wide variety of NetApp data mover tools to help them mobilize their data across these two systems as they transition smoothly into Kubernetes-based architecture. 

 

For more information about how to kick-start your transition and get a free trial of Iguazio on ONTAP AI, contact your NetApp sales representative

Hoseb Dermanilian

Hoseb joined NetApp in 2014. In his current role, he manages and develops AI and Digital Transformation business globally. Hoseb's focus is to propose and discuss NetApp's value add in the AI and Digital Transformation space as well as helping customers build the right platform for their data driven business strategies. As part of the business development, Hoseb is also focused on developing NetApp AI channel business by recruiting and enabling the right AI ecosystem partners and enabling Go-To-Market strategies with those partners. Hoseb is coming from a technical background. In his previous role, He was the Consulting System Engineer for NetApp’s video surveillance and big data analytics solutions.

Hoseb holds a Masters degree with distinction in Electrical and Computer Engineering from the American University of Beirut and he has multiple globally recognized conference and journal publications in the field of IP Security and Cryptography.

Add comment