Multiple Regression Analysis
Real life is complex. Sales depend on price, AND advertising, AND season. Multiple regression handles this.
The Multiple Regression Equation
Interpretation of Coefficients ($b_i$):
The change in $Y$ for a one-unit increase in $X_i$, holding all other variables constant.
$\hat{Y} = 50,000 + 100(\text{SqFt}) + 5000(\text{Bedrooms})$
If you add one bedroom ($X_2$), price increases by $5000, assuming size ($X_1$) stays the same.
Adjusted $R^2$
In simple regression, $R^2$ tells us the % of variance explained. But if you add junk variables, $R^2$ always goes up.
Adjusted $R^2$ penalizes you for adding useless variables. It only goes up if the new variable actually improves the model.
Dummy Variables
How do we include categorical data (e.g., "Has Pool" vs "No Pool")?
We use 0 and 1.
- $X_3 = 1$ if Pool
- $X_3 = 0$ if No Pool
If $b_3 = 10,000$, then having a pool adds $10k to the value.
Test Yourself
Q1: Why do we use Adjusted $R^2$ instead of regular $R^2$ in multiple regression?