Not that long ago, you could walk into a data center and count the number of vendor logos on the fingers of one hand — or two at most, but no toes required. Wow, have times changed. Now you’ll find multiple flash storage vendors and hyper converged systems, and maybe even some open source software like OpenStack. In many data centers, you’ll also find some additional infrastructure as a service (IaaS) from multiple cloud providers. And don’t expect the runaway pace of diversity and complexity to slow down any time soon.

 

How do we, as technologists, respond to the need to manage more complex technology? By creating technology to help us manage the technology, of course. This is where IT infrastructure monitoring software enters the picture, acting as a force-multiplier by bringing configuration and performance data from all of those disparate hybrid cloud systems into a single system for analysis.

 

However, getting all of the infrastructure data into one place is necessary but not sufficient to solve the problem.  Advanced analytics are needed to analyze the vast amounts of data to pinpoint problem areas.  In my recent blog post, I discussed applying machine learning to the data. Although machine learning is a powerful solution, at some point the end user always needs to see and touch the data.

 

Today’s UIs present information in a flexible framework of widgets and dashboards so that end users can look at the data in the format that makes the most sense to them. I have shown these types of visualizations to customers many times, and every time they ask the same question:  “Can I drill down?”

 

It got me thinking — what is this obsession with drilling down? Here are my thoughts.

 

We never trust any data that is summarized for us unless we have access to the source. 

We’ve been trained not to blindly accept conclusions, but rather to look at how the conclusion was arrived at before trusting it. When we see repeatable and believable conclusions, our trust grows.

 

We love treasure hunting. We’re convinced that the smoking gun is just three clicks away, if we only keep looking.

Sometimes our intuition is correct and we do in fact arrive at the correct answer. But given the vast amounts of data being collected, this is increasingly less effective and a huge time sink.

 

We are curious beasts. Put something in front of us and we click just to see what happens.

We all know what happened to the curious cat, so I won’t say any more about this.

 

But what seems odd to me is that customers miss the more important question: “Where should I drill?”

 

If you’ve had a water well drilled on your property, you’ve faced this question. How do you pick a likely spot?

 

If you have a Y-shaped twig handy, you can use it as a divining rod, apply some mystical forces, and find water.  Or you can hire a professional dowser to do the divining for you. (Interestingly, some scientific studies have shown dowsers’ accuracy to be statistically better than blind luck.)

 

Applying this principle to infrastructure monitoring, where there are thousands and thousands of metrics of data, often in a shared infrastructure, how do you know where to drill to find the root cause of an issue?   For example, think of an environment where many VMs are sharing the same storage system.  If that storage system starts to exhibit high latency how do you know which VM is causing the issue?

 

The NetApp® OnCommand® Insight (OCI) team has been studying this problem for many years, and we’ve built a “divining rod” into the product. Using correlation, OCI can identify which resources are likely to have an impact on others. They’re called greedy resources because they consume more than their fair share of the shared resources. This is the “divining rod” that shows you where to start looking.  Now you know how to drill down and apply your knowledge and experience to get to root cause way faster. So drill baby, drill!

 

NetApp OnCommand Insight is an open platform for on-premises and hybrid cloud data center management. To learn all about it, download the OCI datasheet.

mm

Kurt Sand