If you’re new to the world of data science, you’ll know that lack of knowledge in statistics could sometimes be very frustrating and hinder progress. It becomes very important to know at least the basics of statistics. In this post, we’re going back to the basics. We’re going to look at the two major types of statistics – descriptive statistics and inferential statistics. You could probably tell by the name what these two types represent. But we will still see what they mean. Let’s first start with Descriptive Statistics.
Descriptive Statistics is used to describe a large set of data as a summary. Imagine this, you’re working on a hair conditioning product which is supposed to reduce baldness in men. So you want to know what is the primary cause of baldness in men. But you can’t go around asking each and every bald man there is on Earth right now what caused their baldness, it just isn’t practical. So you select a group of bald men in your local community, or extended community. You get a total of 500 men for this. You do a survey with these men and get a general idea of what the various causes of baldness are. Then extrapolate that, so to speak, to apply for the entire bald men population of Earth. The small set of people you surveyed is called the sample. Whatever data you collected, will then be summarised, and that might look something similar to the following:
As you can see, even though you collected 500 answers, you summarised that to just four rows. This is descriptive statistics. This statistics is a representation of the entire population. But it doesn’t stop here. You could derive a lot more information. You can calculate the mean, median, maximum, minimum of the data you’ve collected. Or can even plot kurtosis to better understand the spread. You can look at the skewness. There’s a lot of things you can do with the data you have that could be descriptive of the problem you’re dealing with.
This is descriptive statistics. I hope that was descriptive enough. 🙂
As the name suggests, inferential statistics is used to infer or deduce information or conclusions from the data we have. We’ll continue with the previous example of male baldness. In the previous section, we saw, from the descriptive statistics that we came up with, that the major cause of baldness, according to those men, is hereditary. There’s no medicine for that. So we’ll see the next major cause, which is the quality of water. We can infer that if we improve the quality of the water that people use to shower, we might reduce male baldness. We could come with some medicine to reduce the pollutants in the water, or add minerals to the water to improve its quality, which might help in the reduction of hair fall.
I hope you see where I’m going with this. Anytime you use data or statistics to come to a conclusion about a hypothesis or make predictions based on an earlier phase of descriptive statistics, you’re doing inferential statistics.
Become a Patron!