Just few weeks ago, NetApp celebrated its 25th anniversary. That’s 25 years of helping and enabling banks, governments, hospitals, manufacturers, and many other enterprises to store, protect, and manage their data — the asset that today is changing the world in many different dimensions.
Of course, many factors have contributed to NetApp’s success over the past quarter of a century. But the specific elements that I would like to talk about today are NetApp’s solutions and the whole ecosystem. Today, as the IT world falls in love with “software-defined everything,” and as machines and sensors join human beings in generating data, tighter integration with the different elements of the solution stack is increasingly important, especially the software and application pieces.
Over the past 25 years, data has assumed different shapes, speeds, and volumes, and NetApp has evolved to handle the requirements of these varying forms of data — which leads to the next part of this discussion, NetApp for big data analytics.
One misleading founding slogan of big data was, “Get away from expensive enterprise storage and build your big data on cost-effective off-the-shelf hardware.” I call it misleading because studies have proved that NetApp solutions can bring TCO down by almost 30% compared to the TCO of off-the-shelf installations, without compromising the performance or functionality of the overall solution.
Another misleading statement was related to data locality and the belief that Hadoop deployments require data to be on local drives to function properly. However, Ganesh et al., in their study at the University of California at Berkeley, show that data locality is irrelevant when it comes to cluster computing.
Those two statements were correct when we had slow 1G network connectivity and when the disparity between the performance of the drives and the network bandwidth was a big concern. However, with today’s vast bandwidth networks, direct SAS connections and the recently introduced NVMe over Fabric, these statements are no longer relevant. Cloud providers also debunked these statements when they started offering Hadoop and analytics services using cloud storage.
Decoupling server and storage in big data analytics architectures has proved to be a solid choice for many enterprises that are capturing and analyzing large amounts of structured and unstructured data and that want to realize the full benefits of their deployments without running into infrastructure resilience and agility issues. This is especially true for Hadoop and Splunk environments, where continuously increasing data ingest rates and speeds place heavy demands on scalable and performant infrastructures.
NetApp® solutions for Hadoop feature enterprise storage building blocks that are independent of the compute servers to offer an enterprise-class deployment with lower cluster downtime, higher data availability, and linear scalability. If a disk fails, with a NetApp E-Series storage system running Dynamic Disk Pools technology, performance is negligibly affected, and recovery is 10x faster than with typical RAID schemes on internal storage. New data nodes can be added non-disruptively, and no rebalancing or migration is needed. External data protection reduces both the storage footprint and data replication overhead. In addition, the NetApp NFS connector for Hadoop lets you run your big data analytics workloads — batch, in memory, streaming, or in place — without moving the data or creating a separate analytics silo. Thanks to our technology partnerships, NetApp solutions for Hadoop are tested, validated, and certified with the major distributions: MapR, Hortonworks, and Cloudera.
Splunk solutions have the same benefits. NetApp E-Series building blocks support the Splunk hot, warm, cold, and frozen data tier model in a single architecture. Operations can effectively accelerate data indexing and searching with flash and minimize cost and space for colder data with high-capacity near-line SAS drives.
NetApp’s value add for big data analytics doesn’t end with the benefits of decoupling server and storage or increasing performance. With our Data Fabric strategy, you can enable a hybrid cloud, bringing extensive data management features to your big data analytics environment.
As shown in the following diagram, NetApp solutions for MongoDB NoSQL databases use the NetApp Data Fabric to move data securely across multiple cloud choices, enabled by our ONTAP® cloud software and NetApp Private Storage for the cloud. In addition, the solution delivers the industry’s most efficient and comprehensive integrated data protection suite, on premises or in the cloud, to the world of NoSQL databases.
When we look at the total solution, it’s clear that the added values discussed in this article are all translated into use-case added values. For example, higher performance is translated into faster real-time analytics, which is considered crucial in fraud detection and stock exchange monitoring.
We can help you monetize your data faster and at lower cost, ensure that you spend your time on driving business and building data pipelines rather than on managing hardware and failures, and enable you to protect and govern your big data, even in hybrid cloud environments. NetApp and our great technology partners are on the front lines in leading the digital transformation era.
In the end, data is what matters, and NetApp has been leading the industry for 25 years. Consult with us on your big data analytics projects and let us help you gain maximum value from your data, increase your competitive advantage, and build data-driven business strategies in a faster, more performant, and more reliable way.
In the following video, “Change the World with Big Data Analytics and NetApp,” you can watch the conversation I had with NetApp Data Strategist and Director of Technology Matt Watts around the latest trends in big data analytics and the challenges our customers tell us that they’re facing. Matt and I also discuss how NetApp is enabling our customers to change the world with data.