By Mike McNamara, Sr. Manager, Product Marketing, NetApp


While Hadoop has been used mainly on incoming, external data, there’s also been a need to use it on existing, internal data, typically stored in network-attached storage (NAS). However, using Hadoop on internal data like this has a downside. Typically, it requires setting up another storage silo to host the ) and then running the Hadoop analytics on that storage. This results in additional data management, more inefficiencies, and additional costs of moving the data between NAS and HDFS.


with the NetApp NFS Connector for Hadoop, which allows analytics software to use NetApp clustered Data ONTAP®. The connector works with Apache Hadoop and Apache Spark by using a simple configuration file change that enables data on NFSv3 storage to be analyzed. By using clustered Data ONTAP, the connector decouples analytics from storage, leveraging the benefits of NAS. For even higher performance, the NetApp NFS Connector for Hadoop can be combined with Tachyon to build a scale-out caching tier that is backed by clustered Data ONTAP.


NetApp Solutions for Hadoop and NFS Connector for Hadoop.jpg



You can employ NetApp NFS Connector for Hadoop to run big data analytics on NFSv3 data-without moving the data, creating a separate analytics silo, or setting up a Hadoop cluster. You can start analyzing existing data with Hadoop right away. You can also leverage NFS Connector to run a proof-of-concept, then set up a Hadoop cluster using NetApp Solutions for Hadoop for data from external sources. 


NFS Connector lets you swap out of HDFS for NFS or run NFS alongside HDFS. NFS Connector works with MapReduce for compute or processing and supports other Apache projects, including HBase (columnar database) and Spark (processing engine compatible with Hadoop). These capabilities let NFS Connector support diverse workloads-including batch, in-memory, streaming, and more. 

Mike McNamara

Mike McNamara is a senior leader of product and solution marketing at NetApp with 25 years of data management and data storage marketing experience. Before joining NetApp over 10 years ago, Mike worked at Adaptec, EMC and HP. Mike was a key team leader driving the launch of the industry’s first cloud-connected AI/ML solution (NetApp), unified scale-out and hybrid cloud storage system and software (NetApp), iSCSI and SAS storage system and software (Adaptec), and Fibre Channel storage system (EMC CLARiiON). In addition to his past role as marketing chairperson for the Fibre Channel Industry Association, he is a member of the Ethernet Technology Summit Conference Advisory Board, a member of the Ethernet Alliance, a regular contributor to industry journals, and a frequent speaker at events. Mike also published a book through FriesenPress titled "Scale-Out Storage - The Next Frontier in Enterprise Data Management", and was listed as a top 50 B2B product marketer to watch by Kapos.

Add comment