This post is part 1 of a five-part series that explains how infrastructure analytics can be used to improve IT service delivery and reduce costs in a hybrid cloud environment. For a deeper dive, download the white paper “Data Insights and Control for the Service Provider Business Model.”

                         

Do you want to know about IT bottlenecks before they escalate into major problems? That’s what an infrastructure analytics solution can do for you. It helps you find that needle in a haystack 90% faster than a traditional approach.

 

Before I explain how it works, let’s consider the consequences of “business as usual” and why IT organizations can no longer rely on a traditional approach to troubleshooting performance problems.

 

Your customers have lots of choices, and problems with your company’s online experience can translate directly into lost business and a damaged brand.  Whether a problem manifests as a latency spike or an application crash, it adds up to hurting your customer satisfaction.

 

However, your IT environment is more complex than ever. You have a mandate to modernize, innovate, and above all, act quickly. On-premises data centers need to be updated and new cloud services deployed to support innovation and enable new levels of agility. Increasingly, your enterprise IT infrastructure must be operated and managed as a multisite, hybrid cloud environment.

 

With all that complexity, finding a problem can be like looking for the proverbial needle in a haystack. While you spend time searching for answers—why did that application come to a grinding halt?—your customer satisfaction is plummeting.

 

This is where infrastructure analytics come to the rescue. The anomalies in infrastructure behavior that an analytics solution uncovers can lead you to the source of a problem in minutes, instead of hours or days. Infrastructure analytics let you fix problems before they escalate into emergencies. So you can be a hero—instead of the one who gets the blame when it all goes wrong.

 

It’s as simple as 1, 2, 3.

1. Know What You Have and How it’s Working

Infrastructure analytics solutions, such as NetApp® OnCommand® Insight (OCI), gather data about all of your infrastructure, across multiple clouds and multiple vendors. This data gives you a detailed understanding of your configurations, applications, VMs, networks, and storage, along with their performance attributes. All of this information is stored in a data warehouse that is continuously updated and that can be examined for new insights. For example, analytics can uncover patterns that indicate what is normal and what is outside the range of predicted behavior based on historical data.

2. Find the Anomalies

The traditional method of problem detection is to set thresholds. When those thresholds are breached, you get an alert. The trouble with this method is that you get a lot of alerts that aren’t problems but normal behavior. So you set the thresholds higher and higher to reduce false alarms until, boom — you miss the alert that really mattered. By combining business rules with usage patterns across your entire infrastructure over time, infrastructure analytics and machine learning can deliver alerts only when a situation needs your attention.

3. Pinpoint the Problem

Once you’ve been notified of some anomalous behavior, what do you do? The next step is to find the root cause. Without insight into the complete environment, this means  using multiple tools across numerous silos, which might require you to open hundreds of application windows and take copious notes to figure out what went wrong. However, by using a correlation engine on your data warehouse, you can mine data across the entire infrastructure to easily find the likely culprit. Knowing what elements have a high correlation to each other can quickly tell you where to look. And that’s how you find that needle in the haystack!

 

Watch this short demo to see how easy it can be.

 

 

Infrastructure analytics allow you to fix problems before they impact your business. Let’s look at the Cerner experience as a concrete example. Cerner has a massive IT environment of over 115 petabytes, 400 storage arrays and 60,000 virtual machines.

 

With such a large and complex multisite environment to monitor, the Cerner team needed better tools to help them meet and exceed their SLAs. To do this they used the predictive machine learning capability of OCI to identify and avoid problems before they impacted the business. OCI provided Cerner with end-to-end visibility from the virtual/physical host to the switch to the storage infrastructure across their entire heterogeneous environment.

 

By using OCI, the Cerner team has been able to go from reactive to proactive and now to predictive problem avoidance by using a single root-cause dashboard. They have substantially reduced their MTTR and trouble shooting. To learn more about the Cerner IT experience, watch this webcast on demand in which Jim Avazpour, director of Global Operations, joins me to discuss the challenges of monitoring a massive, multisite IT infrastructure.

mm

Kurt Sand