Estimation & Confidence Intervals
How do we guess population parameters from sample data? We use estimation. It's the bridge between descriptive statistics and hypothesis testing.
Point Estimation
A Point Estimate is a single value used to approximate a population parameter. For example, if you survey 100 customers and their average age is 35, you estimate the average age of all customers is 35.
- Unbiasedness: The expected value of the estimator equals the parameter. (e.g., Sample mean $\bar{x}$ is an unbiased estimator of $\mu$).
- Efficiency: It has the smallest variance among all unbiased estimators.
- Consistency: As sample size increases, the estimator gets closer to the parameter.
Confidence Intervals (CI)
A point estimate is rarely exactly right. A Confidence Interval gives us a range of values where we believe the true parameter lies, with a certain level of confidence (usually 90%, 95%, or 99%).
Think of it as casting a net. You are 95% sure the fish (true mean) is inside the net.
CI for Population Mean ($\sigma$ known)
When we know the population standard deviation ($\sigma$) and sample size is large ($n \ge 30$), we use the Z-distribution.
Common Critical Values ($Z_{\alpha/2}$):
- 90% Confidence: 1.645
- 95% Confidence: 1.96
- 99% Confidence: 2.576
A factory produces bulbs. We know $\sigma = 100$ hours. We test 50 bulbs ($n=50$) and find mean life $\bar{x} = 900$ hours. Find the 95% CI.
$$ 900 \pm 1.96 \left( \frac{100}{\sqrt{50}} \right) $$
$$ 900 \pm 1.96(14.14) = 900 \pm 27.7 $$
Result: [872.3, 927.7] hours.
CI for Population Mean ($\sigma$ unknown)
In real life, we rarely know $\sigma$. We use the sample standard deviation ($s$) instead. Because of this extra uncertainty, we use the t-distribution.
Note: The t-distribution is flatter than the Z-distribution. As $n$ increases, it looks more like Z.
CI for Population Proportion
Used for categorical data (e.g., "What % of voters support Candidate A?").
In a survey of 500 people, 300 said they like the new logo ($\hat{p} = 300/500 = 0.6$). Find 95% CI.
$$ 0.6 \pm 1.96 \sqrt{\frac{0.6(0.4)}{500}} $$
$$ 0.6 \pm 1.96(0.0219) = 0.6 \pm 0.043 $$
Result: [55.7%, 64.3%].
Test Yourself
Q1: If you increase the sample size ($n$), what happens to the width of the confidence interval?
Q2: Which distribution do you use when $\sigma$ is unknown and $n < 30$?