Estimation & Confidence Intervals

How do we guess population parameters from sample data? We use estimation. It's the bridge between descriptive statistics and hypothesis testing.

Point Estimation

A Point Estimate is a single value used to approximate a population parameter. For example, if you survey 100 customers and their average age is 35, you estimate the average age of all customers is 35.

Properties of a Good Estimator:
  • Unbiasedness: The expected value of the estimator equals the parameter. (e.g., Sample mean $\bar{x}$ is an unbiased estimator of $\mu$).
  • Efficiency: It has the smallest variance among all unbiased estimators.
  • Consistency: As sample size increases, the estimator gets closer to the parameter.

Confidence Intervals (CI)

A point estimate is rarely exactly right. A Confidence Interval gives us a range of values where we believe the true parameter lies, with a certain level of confidence (usually 90%, 95%, or 99%).

Think of it as casting a net. You are 95% sure the fish (true mean) is inside the net.

$$ \text{Point Estimate} \pm \text{Margin of Error} $$

CI for Population Mean ($\sigma$ known)

When we know the population standard deviation ($\sigma$) and sample size is large ($n \ge 30$), we use the Z-distribution.

$$ \bar{x} \pm Z_{\alpha/2} \left( \frac{\sigma}{\sqrt{n}} \right) $$

Common Critical Values ($Z_{\alpha/2}$):

  • 90% Confidence: 1.645
  • 95% Confidence: 1.96
  • 99% Confidence: 2.576
Example: Light Bulb Life

A factory produces bulbs. We know $\sigma = 100$ hours. We test 50 bulbs ($n=50$) and find mean life $\bar{x} = 900$ hours. Find the 95% CI.

$$ 900 \pm 1.96 \left( \frac{100}{\sqrt{50}} \right) $$

$$ 900 \pm 1.96(14.14) = 900 \pm 27.7 $$

Result: [872.3, 927.7] hours.

CI for Population Mean ($\sigma$ unknown)

In real life, we rarely know $\sigma$. We use the sample standard deviation ($s$) instead. Because of this extra uncertainty, we use the t-distribution.

$$ \bar{x} \pm t_{\alpha/2, n-1} \left( \frac{s}{\sqrt{n}} \right) $$

Note: The t-distribution is flatter than the Z-distribution. As $n$ increases, it looks more like Z.

CI for Population Proportion

Used for categorical data (e.g., "What % of voters support Candidate A?").

$$ \hat{p} \pm Z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} $$
Marketing Survey

In a survey of 500 people, 300 said they like the new logo ($\hat{p} = 300/500 = 0.6$). Find 95% CI.

$$ 0.6 \pm 1.96 \sqrt{\frac{0.6(0.4)}{500}} $$

$$ 0.6 \pm 1.96(0.0219) = 0.6 \pm 0.043 $$

Result: [55.7%, 64.3%].

Test Yourself

Q1: If you increase the sample size ($n$), what happens to the width of the confidence interval?

  • It gets wider
  • It gets narrower (more precise)
  • Stays the same

Q2: Which distribution do you use when $\sigma$ is unknown and $n < 30$?

  • Normal (Z)
  • Student's t
  • Chi-Square