Statistics 101: Understanding Normal Distribution

In the last blog, we discussed different types of distributions. The Normal Distribution is one of the most important. Its importance can be appreciated once we understand the Mean and Standard Deviation concepts.

The normal distribution, also known as the Gaussian distribution, was popularized by Carl Friedrich Gauss. Statisticians of the 19th century observed that many natural or normal phenomena had this distribution. So the name ‘Normal Distribution’ stuck on. The “normal curve” was popularised by the naturalist Sir Francis Galton in his 1889 work, Natural Inheritance.

Measures of Central Tendency and Dispersion

The Mean of a distribution is called a measure of central tendency. It is the average of the data points of a distribution. So if we take several random numbers and divide the sum of these numbers by the total count of numbers, we get the Mean. Since the Mean shows the central point of a distribution, it is called a measure of central tendency. Mean is typically represented by the symbol µ.

The Standard Deviation of a distribution shows how much the distribution data points are spread out from the mean. Since it shows the spread, it is called a measure of dispersion. Standard Deviation of a distribution is typically represented by the symbol σ.

Consider the distribution below. The drawing shows that the mean is the average value, and the standard deviation represents the data spread from the average value.

The figure shows that the data has a larger variation than the figure below.

A Normal Distribution is symmetric around the mean (average). The data near the mean are more frequent in occurrence than the data away from the mean.

The Importance of the Normal Distribution

One of the most important empirical rule of a Normal Distribution is that for a normal distribution, 68.2% of the observations will appear within ± one standard deviation from the mean; 95.4% will fall within ± two standard deviations; and 99.7% within ±three standard deviations.

Let’s take an example to understand how this empirical rule helps.

We take the average height of a sample of a population and let’s say it is 175cm and a standard deviation of 10cm. If the heights form a normal distribution, then we can predict that 68% of the population will have a height between 165 and 185 cm. 95% of the population will have a height between 155 and 195 cm.

An interesting aspect of how ‘normal’ a Normal Distribution is explained by the Central Limit Theorem.

Central Limit Theorem

The distribution of sample means, calculated from repeated sampling, will tend to normality as the size of the samples gets larger.

So if you run an experiment with large enough samples (>30) and repeat this experiment many times, the distribution of the mean of each experiment will tend to become a Normal Distribution.
In our next blog, we will delve into another interesting topic, Confidence Intervals.