Multiple Regression Analysis

Real life is complex. Sales depend on price, AND advertising, AND season. Multiple regression handles this.

The Multiple Regression Equation

$$ \hat{Y} = b_0 + b_1X_1 + b_2X_2 + ... + b_kX_k $$

Interpretation of Coefficients ($b_i$):

The change in $Y$ for a one-unit increase in $X_i$, holding all other variables constant.

House Price Model

$\hat{Y} = 50,000 + 100(\text{SqFt}) + 5000(\text{Bedrooms})$

If you add one bedroom ($X_2$), price increases by $5000, assuming size ($X_1$) stays the same.

Adjusted $R^2$

In simple regression, $R^2$ tells us the % of variance explained. But if you add junk variables, $R^2$ always goes up.

Adjusted $R^2$ penalizes you for adding useless variables. It only goes up if the new variable actually improves the model.

Dummy Variables

How do we include categorical data (e.g., "Has Pool" vs "No Pool")?

We use 0 and 1.

  • $X_3 = 1$ if Pool
  • $X_3 = 0$ if No Pool

If $b_3 = 10,000$, then having a pool adds $10k to the value.

Test Yourself

Q1: Why do we use Adjusted $R^2$ instead of regular $R^2$ in multiple regression?

  • It's easier to calculate
  • To account for the number of predictors
  • It is always higher