Descriptive Statistics
Before we can analyze data, we must describe it. Where is the center? How spread out is it?
Measures of Central Tendency
- Mean ($\bar{x}$): The average. Sensitive to outliers.
- Median: The middle value. Robust to outliers.
- Mode: The most frequent value.
$$ \bar{x} = \frac{\sum x}{n} $$
Measures of Dispersion
How spread out is the data?
- Range: Max - Min.
- Variance ($s^2$): Average squared deviation from the mean.
- Standard Deviation ($s$): Square root of variance. The average distance from the mean.
- Coefficient of Variation (CV): Relative variability. $CV = (s/\bar{x}) \times 100\%$.
$$ s = \sqrt{\frac{\sum(x - \bar{x})^2}{n-1}} $$
Shape of Distribution
- Skewness: Measure of asymmetry.
- Positive Skew: Tail on the right (Mean > Median).
- Negative Skew: Tail on the left (Mean < Median).
- Kurtosis: Measure of "tailedness" (peakedness).
- Leptokurtic: High peak, fat tails.
- Platykurtic: Flat peak, thin tails.
Visualization
Box Plot (Box-and-Whisker): Shows the 5-number summary (Min, Q1, Median, Q3, Max).
Histogram: Shows frequency distribution.
Test Yourself
Q1: Which measure of central tendency is most affected by extreme outliers?
Q2: If Mean > Median, the distribution is likely: