The NetApp® In-Place Analytics Module (formerly known as NFS Connector for Hadoop) enables analytics software such as Apache Hadoop and Apache Spark to access data by using the NetApp ONTAP® data management software, the Network File System (NFS) protocol, and a simple configuration file change. By using ONTAP software, the NetApp In-Place Analytics Module decouples analytics from storage, leveraging the benefits of NAS to share data.
You can run analytics on NFS data without moving the data, creating a separate analytics silo, or setting up a separate HDFS cluster. The In-Place Analytics Module can also be used side by side with current (or legacy) HDFS deployments without interruption to existing workloads. Several Hadoop services are supported, including YARN, MapReduce, Spark, HBase, Pig, and Hive. The decoupled design allows independent scaling of compute and storage layers, providing the flexibility to add storage capacity without having to also add compute nodes. With this design, just one copy of data is required, unlike HDFS, which is more costly and complex because it requires three copies of data.
In addition, Hadoop or Spark clusters running in a public cloud can analyze the data stored on NetApp Cloud Volumes or NetApp Private Storage (NPS). ONTAP data management capabilities from technologies such as NetApp Snapshot™ and SnapMirror® help create efficient backup and replication of Hadoop data. ONTAP also enables developers to access production data from their development Hadoop or Spark clusters.
Version 3 of the In-Place Analytics Module software is now available and offers the following enhancements:
- Network load distribution across multiple network interfaces, improving performance
- Hortonworks certification for the HDP platform
- Apache Ambari Module for management
- Simplicity enhancements such as user and access management
- Microsoft Azure integration with the HDInsight application
Read this technical report: “NetApp Hybrid Data Protection Solutions for Hadoop and Spark.”