Why data granularity matters in monitoring

Source of the article: http://blogs.vmware.com/management/2016/08/data-granularity-matters-monitoring.html

There are a wide variety of solutions on the market all claim to have various levels of machine learning, adaptive baselines, market driven analytics, learning algorithms etc… By and large they all collect the data the same way, poll vCenter for performance data via real-time APIs and on at the same time or less frequently poll all the other aspects of vSphere (inventory, hierarchy, tasks & events, etc…) and then store that data at varying levels of data granularity.

With every solution collecting the data from the same place, it should make it a straight out one algorithm vs. another, but it turns out that is not the case. There are two aspects that greatly influence the analytics – Frequency of polling and data retention.

Frequency of polling is pretty straight forward. Pull data faster equates to more data points and a better chance of catching peaks, valleys and general usage. However, with faster polling it comes with a cost of performance to poll the data every X minutes/seconds (on vCenter, the data collector and the solutions database) and a huge impact longer term on storing that data. Ideally, there should be some middle ground on collecting the data.

Most solutions poll every 15 minutes. Some of these can be changed down (good), and unfortunately many cannot (not so good). Those that can go lower, generally stop at polling every 5 minutes for vSphere. 5 minutes seems like an eternity to anyone focused performance monitoring and analytics. Fortunately, the vCenter API offers the ability to pull 20 second data for the last 5 minutes, which gets around most complaints. Pull 5 minutes (300 seconds) of history / 20 second point in time cycles = 15 data points.

One would think that is not hard to pull 15 metrics every 5 minutes, but every object can have dozens of metrics and properties to collect. In a smaller environment, this might be doable, but at large scale it can be enormous data sets to poll, forward and store. That data can expose weaknesses in the core platform of the solutions, and thusly 15 minutes or infrequent polling is enforced.

Even still, with all of that data, it has to sit somewhere and be analyzed. The algorithms need the historical context in order to consider ‘what might happen’ in 10 minutes or tomorrow or next week. What are the business cycles of a given application? How granular is it stored over time, and thus available for those super smart algorithms to analyze is key to answer those sorts of questions.

For a few, raw data is kept forever or a configurable amount of time, ideally long enough to analyze full business cycles. The unfortunate answer in most cases however, is it is not stored very granularly. Current data is fed in and stored at a highly granular state for a few days and rolled up over time to hourly or daily chunks. In practice this works well for analyzing short term spikes, but for anything longer term trends it will come with a harsh penalty.

Let’s take a look at this in graphical form. I’m going to use vRealize Operations to visualize the data below in the charts. In figure 1 we see the raw data coming in over the a period of a few days with 5 minute granularity. We can see the peak value of 106% Memory demand.

Figure 1.

Data Granularity set at 5 minutes

Read remaining article here

Talk to our Experts for more details

Leave a comment / Query / Feedback Cancel reply