Descriptive Statistics

Before we can analyze data, we must describe it. Where is the center? How spread out is it?

Measures of Central Tendency

  • Mean ($\bar{x}$): The average. Sensitive to outliers.
  • Median: The middle value. Robust to outliers.
  • Mode: The most frequent value.
$$ \bar{x} = \frac{\sum x}{n} $$

Measures of Dispersion

How spread out is the data?

  • Range: Max - Min.
  • Variance ($s^2$): Average squared deviation from the mean.
  • Standard Deviation ($s$): Square root of variance. The average distance from the mean.
  • Coefficient of Variation (CV): Relative variability. $CV = (s/\bar{x}) \times 100\%$.
$$ s = \sqrt{\frac{\sum(x - \bar{x})^2}{n-1}} $$

Shape of Distribution

  • Skewness: Measure of asymmetry.
    • Positive Skew: Tail on the right (Mean > Median).
    • Negative Skew: Tail on the left (Mean < Median).
  • Kurtosis: Measure of "tailedness" (peakedness).
    • Leptokurtic: High peak, fat tails.
    • Platykurtic: Flat peak, thin tails.

Visualization

Box Plot (Box-and-Whisker): Shows the 5-number summary (Min, Q1, Median, Q3, Max).

Histogram: Shows frequency distribution.

Test Yourself

Q1: Which measure of central tendency is most affected by extreme outliers?

  • Mean
  • Median
  • Mode

Q2: If Mean > Median, the distribution is likely:

  • Symmetric
  • Positively Skewed (Right)
  • Negatively Skewed (Left)