• ML Spring
  • Posts
  • Confidence Intervals clearly explained!

Confidence Intervals clearly explained!

Understand the meaning of "95" in a "95% Confidence Interval"! 🚀

Confidence intervals are widely used yet poorly understood!

Today, I will clearly explain the meaning behind the "95" in a "95% confidence interval."

Before we dive into formal definitions and mathematics, let's first understand what a confidence interval is and why it is necessary with an example.

Suppose you want to estimate the average weight of all the individuals in your town.

You can't weigh every single person, but you can weigh a random sample.

So, what you do is you pick a random sample of 50 individuals and find their average weight.

Let's say the value comes out to be 72 Kgs.

But this is just sample mean, how do we estimate the population meanâť“

Well you can not exactly estimate the population mean!

However, you can estimate an interval around the sample mean in which the population mean would lie, this interval is what we call as the confidence interval.

While estimating this interval you also chose a confidence level:

Common choices are 90%, 95%, and 99%.

The confidence level represents how confident we are that the interval will contain the true population mean.

Here's how it's done👇

You can access the code here👇

import numpy as np
import scipy.stats as stats

# Step 1: Generate a random sample of 50 human weights
sample_size = 50  # Number of people in the sample
average_weight = 70  # Average weight in kg
weight_variation = 15  # Standard deviation in kg

# Generate random weights for 50 people
random_weights = np.random.normal(average_weight, weight_variation, sample_size)

# Step 2: Calculate the sample mean (average) and sample standard deviation
sample_mean = np.mean(random_weights)
sample_std = np.std(random_weights, ddof=1)  # ddof=1 makes it an unbiased estimator

# Step 3: Calculate the 95% confidence interval

# Find the t-critical value for 95% confidence and 49 (n-1) degrees of freedom
t_critical = stats.t.ppf((1 + 0.95) / 2, sample_size - 1)

# Calculate the margin of error
margin_of_error = t_critical * (sample_std / np.sqrt(sample_size))

# Calculate the lower and upper bounds of the confidence interval
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

print(sample_mean)
print(sample_std)
print(f"CI: [{lower_bound}, {upper_bound}]")

🔹 Now we come to the second part & understand what 95% Confidence Actually Mean?

The statement "we are 95% confident that the interval contains the true population parameter" does not mean that there is a 95% probability that the true parameter lies within the interval. This is a common misconception.

If you were to repeat the sample collection many times, each time calculating a 95% confidence interval for the average weight of individuals.

You'd expect about 95% of those intervals to contain the true population average.

The other 5% of the time, you would expect the interval to miss the true average.

🔸 Let's go back to our example: 

You take a sample and calculate a 95% confidence interval for the average weight of individuals to be [67, 75] Kgs.

Someone else takes another sample and finds a different 95% confidence interval, say [66, 74] Kgs.

If this process is repeated many times, each time producing a new 95% confidence interval.

In the long run, you would expect about 95% of these intervals to contain the true average weight of the individuals, and about 5% to miss it.

Here's the true representation of confidence interval, the picture below shows what is true meaning of 50% confidence interval👇

That’s all for today, thanks for reading! 🙂 

Subscribe to keep reading

This content is free, but you must be subscribed to ML Spring to continue reading.

Already a subscriber?Sign In.Not now