Reveal True Data Dispersion with CV and QCD


A guide to computing and interpreting Coefficient of Variation and Quantile Coefficient of Dispersion

Esmaeil Alizadeh

Towards Data Science

Image generated by Author using StockImg.AI

We’ve all heard the saying, “Variety is the spice of life,” and in data, that variety or diversity often takes the form of dispersion.

Data dispersion makes data fascinating by highlighting patterns and insights we wouldn’t have found otherwise. Typically, we use the following as measures of dispersion: variance, standard deviation, range, and interquartile range (IQR). However, we may need to examine dataset dispersion beyond these typical measures in some cases.

This is where the Coefficient of Variation (CV) and Quartile Coefficient of Dispersion (QCD) provide insights when comparing datasets.

In this tutorial, we will explore the two concepts of CV and QCD and answer the following questions for each of them:

  • What are they, and how are they defined?
  • How can they be computed?
  • How to interpret the results?

All the above questions will be answered thoroughly and through two examples.

Whether we’re measuring people’s heights or housing prices, we seldom find all data points to be the same. We won’t expect everyone to be the same. Some people are tall, average, or short. The data generally varies. In order to study this data variability or dispersion, we usually quantify it using measures like range, variance, standard deviation, etc. The measures of dispersion quantify how spread out our data points are.

However, what if we wish to evaluate the variability across datasets? For example, what if we want to compare the sales prices of a jewelry shop and a bookstore? Standard deviation won’t work here, as the scales of the two datasets are likely very different.

The CV and QCD are useful indicators of dispersion in this context.

Source link

Leave a Comment