IDC forecasts that by 2025, the global datasphere will grow to 163ZB (a trillion gigabytes!). They also predict that by 2019, third-platform (cloud, social, mobile, and big data) technologies and services will drive nearly 75% of IT spending and help enterprises unlock unique user experiences and a new world of business opportunities. However, enterprises face several technical challenges when deploying Hadoop to analyze this massive data growth, specifically in the areas of cluster availability, operations performance, and scaling. NetApp has developed a reference architecture with Hortonworks to deliver a solution that overcomes these challenges so that businesses can ingest, store, and manage big data with greater reliability and scalability, and with less time spent on operations and maintenance.
Why NetApp for Hadoop?
Decouple Storage from Compute
A decoupled Hadoop solution, in which storage and compute are segregated, allows independent scaling of storage and compute layers. This capability offers the flexibility to add storage capacity without adding compute nodes, resulting in higher server utilization and fewer, less expensive servers. It also eliminates the need for rebalancing or migration when new data nodes are added, making the data lifecycle nondisruptive.
Drive Extreme Performance and High Availability
NetApp® E/EF-Series storage systems can turbocharge access to your data and increase its value. Requiring just 2U of rack space, the systems combine extreme (up to 1M) IOPS, sub-100 microsecond response times, and up to 21GBps of bandwidth with leading, enterprise-proven availability features.
Reduce Hadoop Replication Factor and Improve TCO
With NetApp Dynamic Disk Pool (DDP) technology on a NetApp E/EF-Series storage system, the data and parity information are distributed across a pool of drives. Because of these intelligent features, NetApp recommends using a replication factor of 2 instead of 3 when using an E-Series storage system. The lower replication factor puts less load on the network and jobs complete faster. Also, the reduction in the number of data copies saves money by requiring less storage.
Maintain Enterprise-Class Data Protection
With five issued patents, DDP technology is designed to deliver worry-free storage through effortless management and self-optimization while maintaining predictable performance under any condition, including recovery from drive failures. With up to 4 times faster rebuild, DDP technology significantly reduces exposure to multiple cascading disk failures, providing excellent data protection.
Enable Enterprise-Level Scalability and Flexibility
The NetApp SANtricity® software that comes with the E/EF-Series systems offers an intuitive and user-friendly interface from which extra storage (up to 4.8PB) can be added seamlessly. This flexibility makes it convenient to scale the storage capacity up or down as needed without affecting any running jobs.
Easy to Deploy and to Use
There is a steep learning curve for customers who are new to Hadoop. Few enterprise applications are built to run on massively parallel clusters. However, the NetApp E/EF-Series solution for Hadoop provides an operational model for a Hadoop cluster that does not require additional attention after its initial setup. The cluster is stable and easy to maintain, allowing you to concentrate on meeting your business needs. This solution flattens the operational learning curve of Hadoop.
Hortonworks is a leading innovator in the industry, creating, distributing, and supporting enterprise-ready open data platforms and modern data applications. They actively contribute to open source communities like Apache Hadoop, NiFi, and Spark. NetApp has partnered with Hortonworks to certify the NetApp Hadoop solutions. For more information, refer to the Hortonworks website. The following table shows all the E/EF-Series products and network protocols that you can use to create your Hadoop cluster using NetApp and Hortonworks.
For more details about why you should use NetApp for Hortonworks, refer to this solution brief, NetApp Solutions for Hortonworks. Also, to learn more about all the NetApp solutions for big data analytics, refer to the NetApp big data analytics website. And if you are attending the Strata Data Conference in New York City, September 11-13, 2018, visit us at booth #1439. NetApp is a Platinum Sponsor of the conference.