# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 54.0 0.315 171. 0
2 gdpPercap 0.000765 0.0000258 29.7 3.57e-156
EC 320 - Introduction to Econometrics
2025
Suppose we would like to estimate the degree to which an increase in GDP correlates with Life expectancy. We set up our model as follows:
\[ {\text{Life Expectancy}_i} = \beta_0 + \beta_1 \text{GDP}_i + u_i \]
Using the gapminder package in R, we could quickly generate estimates to get at the correlation But first, as always, let’s plot it before running the regression
Visualize the OLS fit? Is \(\beta_1\) positive or negeative?
Using the gapminder, we could quickly generate estimates for
\[ \widehat{\text{Life Expectancy}_i} = \hat{\beta_0} + \hat{\beta_1} \cdot \text{GDP}_i \]
Fitting OLS. But are you satisfied? Can we do better?
Up to this point, we’ve acknowledged OLS as a “linear” estimator.
Many economic relationships are nonlinear.
The “linear” in simple linear regression refers to the linearity of the parameters or coefficients, not the predictors themselves.
OLS is flexible and can accommodate a subset of nonlinear relationships.
Put different, independent variables can be a linear combination of the parameters, regardless of any nonlinear transformations
Linear-in-parameters: Parameters enter model as a weighted sum, where the weights are functions of the variables.
Linear-in-variables: Variables enter the model as a weighted sum, where the weights are functions of the parameters.
The standard linear regression model satisfies both properties:
\[Y_i = \beta_0 + \beta_1X_{1i} + \beta_2X_{2i} + \dots + \beta_kX_{ki} + u_i\]
Which of the following are an example of linear-in-parameters, linear-in-variables, or neither?
1. \(Y_i = \beta_0 + \beta_1X_{i} + \beta_2X_{i}^2 + \dots + \beta_kX_{i}^k + u_i\)
2. \(Y_i = \beta_0X_i^{\beta_1}v_i\)
3. \(Y_i = \beta_0 + \beta_1\beta_2X_{i} + u_i\)
Which of the following are an example of linear-in-parameters, linear-in-variables, or neither?
1. \(\color{#A3BE8C}{Y_i = \beta_0 + \beta_1X_{i} + \beta_2X_{i}^2 + \dots + \beta_kX_{i}^k + u_i}\)
2. \(Y_i = \beta_0X_i^{\beta_1}v_i\)
3. \(Y_i = \beta_0 + \beta_1\beta_2X_{i} + u_i\)
Model 1 is linear-in-parameters, but not linear-in-variables.
Which of the following are an example of linear-in-parameters, linear-in-variables, or neither?
1. \(\color{#A3BE8C}{Y_i = \beta_0 + \beta_1X_{i} + \beta_2X_{i}^2 + \dots + \beta_kX_{i}^k + u_i}\)
2. \(\color{#434C5E}{Y_i = \beta_0X_i^{\beta_1}v_i}\)
3. \(Y_i = \beta_0 + \beta_1\beta_2X_{i} + u_i\)
Model 1 is linear-in-parameters, but not linear-in-variables.
Model 2 is neither.
Which of the following are an example of linear-in-parameters, linear-in-variables, or neither?
1. \(\color{#A3BE8C}{Y_i = \beta_0 + \beta_1X_{i} + \beta_2X_{i}^2 + \dots + \beta_kX_{i}^k + u_i}\)
2. \(\color{#434C5E}{Y_i = \beta_0X_i^{\beta_1}v_i}\)
3. \(\color{#B48EAD}{Y_i = \beta_0 + \beta_1\beta_2X_{i} + u_i}\)
Model 1 is linear-in-parameters, but not linear-in-variables.
Model 2 is neither.
Model 3 is linear-in-variables, but not linear-in-parameters.
The natural log is the inverse function for the exponential function:
\[ \quad \log(e^x) = x \quad \text{for} \quad x>0 \]
(Natural) Log rules:
1. Product rule: \(\log(AB) = \log(A) + \log(B)\).
2. Quotient rule: \(\log(A/B) = \log(A) - \log(B)\).
3. Power rule: \(\log(A^B) = B \cdot \log(A)\).
4. Derivative: \(f(x) = \log(x)\) => \(f'(x) = \dfrac{1}{x}\).
Note: \(\log(e) = 1\), \(\log(1) = 0\), and \(\log(x)\) is undefined for \(x \leq 0\).
Nonlinear Model
\[ Y_i = \alpha e^{\beta_1 X_i}v_i \]
Logarithmic Transformation
\[ \log(Y_i) = \log(\alpha) + \beta_1 X_i + \log(v_i) \]
Redefine \(\log(\alpha) \equiv \beta_0\), \(\log(v_i) \equiv u_i\).
Transformed (Linear) Model
\[ \log(Y_i) = \beta_0 + \beta_1 X_i + u_i \]
Can estimate with OLS, but interpretation changes.
Regression Model
\[ \log(Y_i) = \beta_0 + \beta_1 X_i + u_i \]
Interpretation
If \(\log(\hat{\text{Pay}_i}) = 2.9 + 0.03 \cdot \text{School}_i\), then an additional year of schooling increases pay by approximately 3 percent, on average.
Derivation Consider the log-linear model
\[ \log(Y) = \beta_0 + \beta_1 \, X + u \]
and differentiate
\[ \dfrac{dY}{Y} = \beta_1 dX \]
Marginal change in \(X\) (\(dX\)) leads to a \(\beta_1 dX\) proportionate change in \(Y\).
\[ log(\hat{Y}_{i}) = 10.02 + 0.73 \cdot X_{i} \]
\[ log(\hat{Y}_{i}) = 10.02 + 0.73 \cdot X_{i} \]
Note: If you have a log-linear model with a binary indicator variable, the interpretation of the coefficient on that variable changes. Consider
\[ \log(Y_i) = \beta_0 + \beta_1 X_i + u_i \]
for binary variable \(X\).
Interpretation of \(\beta_1\):
Take a binary explanatory variable: trained
trained = 1 if employee \(i\) received trainingtrained = 0 if employee \(i\) did not receive training| Term | Estimate | Std. Error | Statistic | P-value |
|---|---|---|---|---|
| Intercept | 9.94 | 0.0446 | 223 | 0 |
| Trained | 0.557 | 0.0631 | 8.83 | 4.72e-18 |
Q. How do we interpret the coefficient on trained?
A1: Trained workers are 74.52 percent more productive than untrained workers.
A2: Untrained workers are 42.7 percent less productive than trained workers.
Nonlinear Model
\[ Y_i = \alpha X_i^{\beta_1}v_i \]
Logarithmic Transformation
\[ \begin{align*} \log(Y_i) = \log(\alpha) +& \beta_1 \log(X_i) \\ +& \log(v_i) \end{align*} \]
Transformed (Linear) Model
\[ \log(Y_i) = \beta_0 + \beta_1 \log(X_i) + u_i \]
Can estimate with OLS, but interpretation changes.
\[ \log(Y_i) = \beta_0 + \beta_1 \log(X_i) + u_i \]
Interpretation
If \(\log(\widehat{\text{Quantity Demanded}}_i) = 0.45 - 0.31 \cdot \log(\text{Income}_i)\), then each one-percent increase in income decreases quantity demanded by 0.31 percent.
Consider the log-log model
\[ \log(Y_i) = \beta_0 + \beta_1 \log(X_i) + u \]
and differentiate
\[ \dfrac{dY}{Y} = \beta_1 \dfrac{dX}{X} \]
A one-percent increase in \(X\) leads to a \(\beta_1\)-percent increase in \(Y\).
\[ \dfrac{dY}{dX} \dfrac{X}{Y} = \beta_1 \]
\[ log(\hat{Y}_{i}) = 0.01 + 2.99 \cdot log(X_{i}) \]
\[ log(\hat{Y}_{i}) = 0.01 + 2.99 \cdot log(X_{i}) \]
Nonlinear Model
\[ e^{Y_i} = \alpha X_i^{\beta_1}v_i \]
Logarithmic Transformation
\[ Y_i = \log(\alpha) + \beta_1 \log(X_i) + \log(v_i) \]
Redefine \(\log(\alpha) \equiv \beta_0\), \(\log(v_i) \equiv u_i\).
Transformed (Linear) Model
\[ Y_i = \beta_0 + \beta_1 \log(X_i) + u_i \]
Can estimate with OLS, but interpretation changes.
Regression Model
\[ Y_i = \beta_0 + \beta_1 \log(X_i) + u_i \]
Interpretation
If \(\widehat{(\text{Blood Pressure})_i} = 150 - 9.1 \log(\text{Income}_i)\), then a one-percent increase in income decrease blood pressure by 0.091 points.
Consider the log-linear model
\[ Y = \beta_0 + \beta_1 \log(X) + u \]
and differentiate
\[ dY = \beta_1 \dfrac{dX}{X} \]
A one-percent increase in \(X\) leads to a \(\beta_1 \div 100\) change in \(Y\).
\[ \hat{Y}_{i} = 0 + 0.99 \cdot log(X_{i}) \]
\[ \hat{Y}_{i} = 0 + 0.99 \cdot log(X_{i}) \]
For Model Type:
Linear-Linear \(\rightarrow Y_{i} = \beta_{0} + \beta_{1} X_{i} + u_{i}\)
Log-Linear \(\rightarrow log(Y_{i}) = \beta_{0} + \beta_{1} X_{i} + u_{i}\)
Log-Log \(\rightarrow log(Y_{i}) = \beta_{0} + \beta_{1} log(X_{i}) + u_{i}\)
Linear-Log \(\rightarrow Y_{i} = \beta_{0} + \beta_{1} log(X_{i}) + u_{i}\)
\[ (\widehat{\text{Life Expectancy})_i} = 53.96 + 8 \times 10^{-4} \cdot \text{GDP}_i \quad\quad R^2 = 0.34 \]
\[ log(\widehat{\text{Life Expectancy})_i} = 3.97 + 1.3 \times 10^{-5} \cdot \text{GDP}_i \quad\quad R^2 = 0.3 \]
\[ log(\widehat{\text{Life Expectancy})_i} = 2.86 + 0.15 \cdot log(\text{GDP}_i) \quad\quad R^2 = 0.61 \]
\[ (\widehat{\text{Life Expectancy})_i} = -9.1 + 8.41 \cdot log(\text{GDP}_i) \quad\quad R^2 = 0.65 \]
Consideration 01 Does your data take negative numbers or zeros as values?
Consideration 02 What coefficient intepretation do you want?
Consideration 03 Are your data skewed?
Let’s talk about a wage regression again. Suppose we would like to estimate the effect of age on earnings. We estimate the following SLR:
\[ \text{Wage}_i = \beta_0 + \beta_1 \text{Age}_i + u_i \]
However, maybe we believe that \(\text{Wage}_i\) and \(\text{Age}_i\) have some nonlinear relationship—the effect of an additional year of experience, when age is 27 vs age is 67, might be different. So instead, we might estimate:
\[ \text{Wage}_i = \beta_0 + \beta_1 \text{Age}_i + \beta_2 \text{Age}^2_i + u_i \]
In this model:
\[ \text{Wage}_i = \beta_0 + \beta_1 \text{Age}_i + \beta_2 \text{Age}^2_i + u_i \]
the effect of \(\text{Age}_i\) on \(\text{Wage}_i\) would be:
\[ \frac{\partial \text{Wage}_i}{\partial \text{Age}_i} = \beta_1 + 2\beta_2 \text{Age}_i \]
Regression Model
\[ Y_i = \beta_0 + \beta_1 X_i + \beta_2 X_i^2 + u_i \]
Interpretation
Sign of \(\beta_2\) indicates whether the relationship is convex (+) or concave (-)
Sign of \(\beta_1\)? 🤷
Partial derivative of \(Y\) wrt. \(X\) is the marginal effect of \(X\) on \(Y\):
\[ \color{#B48EAD}{\dfrac{\partial Y}{\partial X} = \beta_1 + 2 \beta_2 X} \]
| Term | Estimate | Std. Error | Statistic | P-value |
|---|---|---|---|---|
| Intercept | 30,046 | 138 | 218 | 0 |
| X | 158.89 | 5.81 | 27.3 | 2.58e-123 |
| \(X^{2}\) | -1.50 | 0.0564 | -26.6 | 6.19e-118 |
What is the marginal effect of \(X\) on \(Y\)?
\[ \hat{\dfrac{\partial Y}{\partial X}} = \hat{\beta}_{1} + 2 \hat{\beta}_{2} X = 158.89 + 2(-1.50)X = 158.89 - 3X \]
Depends on level of \(X\)
| Term | Estimate | Std. Error | Statistic | P-value |
|---|---|---|---|---|
| Intercept | 30,046 | 138 | 218 | 0 |
| X | 158.89 | 5.81 | 27.3 | 2.58e-123 |
| \(X^{2}\) | -1.50 | 0.0564 | -26.6 | 6.19e-118 |
What is the marginal effect of \(X\) on \(Y\), when \(X = 0\)?
\[ \widehat{\dfrac{\partial \text{Y}}{\partial \text{X}} }\Bigg|_{\small \text{X}=0} = \hat{\beta}_{1} = 158.89 \]
| Term | Estimate | Std. Error | Statistic | P-value |
|---|---|---|---|---|
| Intercept | 30,046 | 138 | 218 | 0 |
| X | 158.89 | 5.81 | 27.3 | 2.58e-123 |
| \(X^{2}\) | -1.50 | 0.0564 | -26.6 | 6.19e-118 |
What is the marginal effect of \(X\) on \(Y\), when \(X = 2\)?
\[ \widehat{\dfrac{\partial \text{Y}}{\partial \text{X}} }\Bigg|_{\small \text{X}=2} = \hat{\beta}_{1} + 2 \hat{\beta}_{2} \cdot (2) = 158.89 - 5.99 = 152.9 \]
| Term | Estimate | Std. Error | Statistic | P-value |
|---|---|---|---|---|
| Intercept | 30,046 | 138 | 218 | 0 |
| X | 158.89 | 5.81 | 27.3 | 2.58e-123 |
| \(X^{2}\) | -1.50 | 0.0564 | -26.6 | 6.19e-118 |
What is the marginal effect of \(X\) on \(Y\), when \(X = 7\)?
\[ \widehat{\dfrac{\partial \text{Y}}{\partial \text{X}} }\Bigg|_{\small \text{X}=7} = \hat{\beta}_{1} + 2 \hat{\beta}_{2} \cdot (7) = 158.89 - 20.98 = 137.91 \]
Where does the regression \(\hat{Y_i} = \hat{\beta}_0 + \hat{\beta}_1 X_i + \hat{\beta}_2 X_i^2\) turn?
Step 1: Take the derivative and set equal to zero.
\[ \widehat{\dfrac{\partial \text{Y}}{\partial \text{X}} } = \hat{\beta}_1 + 2\hat{\beta}_2 X = 0 \]
Step 1: Solve for \(X\).
\[ X = -\dfrac{\hat{\beta}_1}{2\hat{\beta}_2} \]
Ex. Peak of previous regression occurs at \(X = 53.02\).
Four “identical” regressions: Intercept \(= 3\), Slope \(= 0.5\), \(R^{2} = 0.67\)
Same results, but with very different distributions only visible when you scatter them
EC320, Lecture 06 | Non-Linear Models