IDC forecasts that by 2025, the global datasphere will grow to 163ZB (a trillion gigabytes!). They also predict that by 2019, third-platform (cloud, social, mobile, and big data) technologies and services will drive nearly 75% of IT spending and help enterprises unlock unique user experiences and a new world of business opportunities. However, enterprises face several technical challenges when deploying Hadoop to analyze this massive data growth, specifically in the areas of cluster availability, operations performance, and scaling. NetApp has developed a reference architecture with Hortonworks to deliver a solution that overcomes these challenges so that businesses can ingest, store, and manage big data with greater reliability and scalability, and with less time spent on operations and maintenance.

Why NetApp for Hadoop?

Decouple Storage from Compute

A decoupled Hadoop solution, in which storage and compute are segregated, allows independent scaling of storage and compute layers. This capability offers the flexibility to add storage capacity without adding compute nodes, resulting in higher server utilization and fewer, less expensive servers. It also eliminates the need for rebalancing or migration when new data nodes are added, making the data lifecycle nondisruptive.

Drive Extreme Performance and High Availability

NetApp® E/EF-Series storage systems can turbocharge access to your data and increase its value. Requiring just 2U of rack space, the systems combine extreme (up to 1M) IOPS, sub-100 microsecond response times, and up to 21GBps of bandwidth with leading, enterprise-proven availability features.

Reduce Hadoop Replication Factor and Improve TCO

With NetApp Dynamic Disk Pool (DDP) technology on a NetApp E/EF-Series storage system, the data and parity information are distributed across a pool of drives. Because of these intelligent features, NetApp recommends using a replication factor of 2 instead of 3 when using an E-Series storage system. The lower replication factor puts less load on the network and jobs complete faster. Also, the reduction in the number of data copies saves money by requiring less storage.

Maintain Enterprise-Class Data Protection

With five issued patents, DDP technology is designed to deliver worry-free storage through effortless management and self-optimization while maintaining predictable performance under any condition, including recovery from drive failures. With up to 4 times faster rebuild, DDP technology significantly reduces exposure to multiple cascading disk failures, providing excellent data protection.

Enable Enterprise-Level Scalability and Flexibility

The NetApp SANtricity® software that comes with the E/EF-Series systems offers an intuitive and user-friendly interface from which extra storage (up to 4.8PB) can be added seamlessly. This flexibility makes it convenient to scale the storage capacity up or down as needed without affecting any running jobs.

Easy to Deploy and to Use

There is a steep learning curve for customers who are new to Hadoop. Few enterprise applications are built to run on massively parallel clusters. However, the NetApp E/EF-Series solution for Hadoop provides an operational model for a Hadoop cluster that does not require additional attention after its initial setup. The cluster is stable and easy to maintain, allowing you to concentrate on meeting your business needs. This solution flattens the operational learning curve of Hadoop.

Hortonworks Certified

Hortonworks is a leading innovator in the industry, creating, distributing, and supporting enterprise-ready open data platforms and modern data applications. They actively contribute to open source communities like Apache Hadoop, NiFi, and Spark. NetApp has partnered with Hortonworks to certify the NetApp Hadoop solutions. For more information, refer to the Hortonworks website. The following table shows all the E/EF-Series products and network protocols that you can use to create your Hadoop cluster using NetApp and Hortonworks.

For more details about why you should use NetApp for Hortonworks, refer to this solution brief, NetApp Solutions for Hortonworks. Also, to learn more about all the NetApp solutions for big data analytics, refer to the NetApp big data analytics website. And if you are attending the Strata Data Conference in New York City, September 11-13, 2018, visit us at booth #1439. NetApp is a Platinum Sponsor of the conference.

Mike McNamara

Mike McNamara is a senior manager of product and solution marketing at NetApp with over 25 years of storage and data management marketing experience. Before joining NetApp over 10 years ago, Mike worked at Adaptec, EMC and Digital Equipment Corporation. Mike was a key leader driving the launch of the industry’s first unified scale-out storage system (NetApp), iSCSI and SAS storage system (Adaptec), and Fibre Channel storage system (EMC CLARiiON ). In addition to his past role as marketing chairperson for the Fibre Channel Industry Association, he is a member of the Ethernet Technology Summit Conference Advisory Board, a member of the Ethernet Alliance, a regular contributor to industry journals, and a frequent speaker at events.

Faiz Abidi

Faiz graduated with a Master’s in Computer Science from Virginia Tech in 2017. Since then, he has been working in the big data team at NetApp focusing on Hadoop and Spark solutions. In his free time, he likes to read, run, and play badminton.