The NetApp® In-Place Analytics Module (formerly known as NFS Connector for Hadoop) enables analytics software such as Apache Hadoop and Apache Spark to access data by using the NetApp ONTAP® data management software, the Network File System (NFS) protocol, and a simple configuration file change. By using ONTAP software, the NetApp In-Place Analytics Module decouples analytics from storage, leveraging the benefits of NAS to share data.

Benefits

You can run analytics on NFS data without moving the data, creating a separate analytics silo, or setting up a separate HDFS cluster. The In-Place Analytics Module can also be used side by side with current (or legacy) HDFS deployments without interruption to existing workloads. Several Hadoop services are supported, including YARN, MapReduce, Spark, HBase, Pig, and Hive. The decoupled design allows independent scaling of compute and storage layers, providing the flexibility to add storage capacity without having to also add compute nodes. With this design, just one copy of data is required, unlike HDFS, which is more costly and complex because it requires three copies of data.

In addition, Hadoop or Spark clusters running in a public cloud can analyze the data stored on NetApp Cloud Volumes or NetApp Private Storage (NPS). ONTAP data management capabilities from technologies such as NetApp Snapshot™ and SnapMirror® help create efficient backup and replication of Hadoop data. ONTAP also enables developers to access production data from their development Hadoop or Spark clusters.

New Version

Version 3 of the In-Place Analytics Module software is now available and offers the following enhancements:

  • Network load distribution across multiple network interfaces, improving performance
  • Hortonworks certification for the HDP platform
  • Apache Ambari Module for management
  • Simplicity enhancements such as user and access management
  • Microsoft Azure integration with the HDInsight application

Learn More

Read this technical report: “NetApp Hybrid Data Protection Solutions for Hadoop and Spark.”

Mike McNamara

Mike McNamara is a senior manager of product and solution marketing at NetApp with over 25 years of storage and data management marketing experience. Before joining NetApp over 10 years ago, Mike worked at Adaptec, EMC and Digital Equipment Corporation. Mike was a key leader driving the launch of the industry’s first unified scale-out storage system (NetApp), iSCSI and SAS storage system (Adaptec), and Fibre Channel storage system (EMC CLARiiON ). In addition to his past role as marketing chairperson for the Fibre Channel Industry Association, he is a member of the Ethernet Technology Summit Conference Advisory Board, a member of the Ethernet Alliance, a regular contributor to industry journals, and a frequent speaker at events.