Linear models

Author

Francisco Rodríguez-Sánchez

A simple linear model

Example dataset: forest trees

Download this dataset

Import:

Questions

What is the relationship between DBH and height?
Do taller trees have bigger trunks?
Can we predict height from DBH? How well?

Plot your data first!

Exploratory Data Analysis (EDA)

Outliers

Histogram of response variable

Histogram of predictor variable

Scatterplot

Model fitting

Now fit model

Hint: lm

which corresponds to

\[ \begin{aligned} Height_{i} = a + b \cdot DBH_{i} + \varepsilon _{i} \\ \varepsilon _{i}\sim N\left( 0,\sigma^2 \right) \\ \end{aligned} \]

Package `equatiomatic` returns model structure

library("equatiomatic")
m1 <- lm(height ~ dbh, data = trees)
equatiomatic::extract_eq(m1)

equatiomatic::extract_eq(m1, use_coefs = TRUE)

Model interpretation

What does this mean?

Estimated distribution of the intercept parameter

Estimated distribution of the slope parameter

Distribution of residuals

Degrees of freedom

DF = n - p

n = sample size

p = number of estimated parameters

R-squared

Proportion of ‘explained’ variance

\(R^{2} = 1 - \frac{Residual Variation}{Total Variation}\)

Adjusted R-squared

Accounts for model complexity

(number of parameters)

\(R^2_{adj} = 1 - (1 - R^2) \frac{n - 1}{n - p - 1}\)

Quiz

https://pollev.com/franciscorod726

Retrieving model coefficients

Confidence intervals for parameters

Retrieving model parameters (easystats)

https://easystats.github.io/parameters/

Communicating results

Avoid dichotomania of statistical significance

“Never conclude there is ‘no difference’ or ‘no association’ just because p > 0.05 or CI includes zero”
Estimate and communicate effect sizes and their uncertainty
https://doi.org/10.1038/d41586-019-00857-9

Communicating results

We found a significant relationship between DBH and Height (p<0.05).
We found a {significant} positive relationship between DBH and Height {(p<0.05)} (b = 0.61, SE = 0.01).
(add p-value if you wish)

Models that describe themselves (easystats)

https://easystats.github.io/report/

Generating table with model results: `modelsummary`

https://modelsummary.com/

Generating table with model results: `modelsummary`

Visualising fitted model

Plot model: `visreg`

visreg can use ggplot2 too

visreg(m1, gg = TRUE) + theme_bw()

https://pbreheny.github.io/visreg

Plot (easystats)

Plot (modelsummary)

Plot model parameters with easystats (`see` package)

Plot parameters’ estimated distribution

Model checking

Linear model assumptions

Linearity (transformations, GAM…)
Residuals:
- Independent
- Equal variance
- Normal
Negligible measurement error in predictors

Are residuals normal?

SD = 4.09

Model checking: `plot(model)`

Model checking with `performance` (easystats)

https://easystats.github.io/performance/articles/check_model.html

A dashboard to explore the full model

model_dashboard(m1)

Making predictions with easystats

Estimate expected values

Expected values given DBH

Calibration plot: observed vs predicted

Estimate prediction interval

Accounting for residual variation!

Confidence vs Prediction interval

Make predictions for new data

Workflow

Visualise data
Understand fitted model (summary)
Visualise model (visreg…)
Check model (plot, check_model, calibration plot…)
Predict (predict, estimate_expectation, estimate_prediction)

Categorical predictors (factors)

Q: Does tree height vary with sex?

Model height ~ sex

Linear model with categorical predictors

corresponds to

\[ \begin{aligned} Height_{i} = a + b_{male} + \varepsilon _{i} \\ \varepsilon _{i}\sim N\left( 0,\sigma^2 \right) \\ \end{aligned} \]

Model height ~ sex

Quiz

https://pollev.com/franciscorod726

Let’s read the model report…

Estimated distribution of the intercept parameter

Intercept = Height of females

Estimated distribution of the beta parameter

beta = height difference of males vs females

Analysing differences among factor levels

Visualising the fitted model

Plot (visreg)

Plot (easystats)

Model checking

Model checking: residuals

Model checking (easystats)

Q: Does height differ among field sites?

Quiz

https://pollev.com/franciscorod726

Plot data first

Linear model with categorical predictors

\[ \begin{aligned} y_{i} = a + b_{site2} + c_{site3} + d_{site4} + e_{site5} +...+ \varepsilon _{i} \\ \varepsilon _{i}\sim N\left( 0,\sigma^2 \right) \\ \end{aligned} \]

Model Height ~ site

All right here?

site is a factor!

Model Height ~ site

Estimated parameter distributions

Estimated tree heights for each site

Plot estimated tree heights for each site

Analysing differences among factor levels

For finer control see emmeans package

Analysing differences among factor levels

How different are site 2 and site 9?

Presenting model results

Plot (visreg)

Plot (easystats)

Plot model (modelsummary)

Plot model (easystats)

Fit model without intercept

Model checking: residuals

Combining continuous and categorical predictors

Predicting tree height based on dbh and site

corresponds to

\[ \begin{aligned} y_{i} = a + b_{site2} + c_{site3} + d_{site4} + e_{site5} +...+ k \cdot DBH_{i} + \varepsilon _{i} \\ \varepsilon _{i}\sim N\left( 0,\sigma^2 \right) \\ \end{aligned} \]

Predicting tree height based on dbh and site

Presenting model results

Estimated tree heights for each site

Fit model without intercept

Plot (visreg)

Plot model (easystats)

Keeping sites only, dropping “dbh”

Plot model (modelsummary)

Keeping sites only, dropping “dbh”

What happened to site 8?

site 8 has the largest diameters:

DBH

HEIGHT

We have fitted model w/ many intercepts and single slope

Slope is the same for all sites

Model checking: residuals

Model checking with easystats

How good is this model? Calibration plot

How good is this model? Calibration plot (easystats)

Posterior predictive checking

Simulating response data from fitted model (yrep)

and comparing with observed response (y)

Predicting heights of new trees (easystats)

Using model for prediction

Expected height of 10-cm diameter tree in each site?

Using model for prediction

Expected height of 10-cm DBH trees at each site

Using model for prediction

Prediction intervals (accounting for residual variance)

Q: Does allometric relationship between Height and Diameter vary among sites?

Model with interactions

Does slope vary among sites?

END

A simple linear model

Example dataset: forest trees

Questions

Plot your data first!

Exploratory Data Analysis (EDA)

Histogram of response variable

Histogram of predictor variable

Scatterplot

Model fitting

Now fit model

Package equatiomatic returns model structure

Model interpretation

What does this mean?

Estimated distribution of the intercept parameter

Estimated distribution of the slope parameter

Distribution of residuals

Degrees of freedom

R-squared

Adjusted R-squared

Quiz

Retrieving model coefficients

Confidence intervals for parameters

Retrieving model parameters (easystats)

Communicating results

Avoid dichotomania of statistical significance

Communicating results

Models that describe themselves (easystats)

Generating table with model results: modelsummary

Generating table with model results: modelsummary

Visualising fitted model

Plot model: visreg

Plot (easystats)

Plot (modelsummary)

Plot model parameters with easystats (see package)

Plot parameters’ estimated distribution

Model checking

Linear model assumptions

Are residuals normal?

Model checking: plot(model)

Model checking with performance (easystats)

A dashboard to explore the full model

Making predictions with easystats

Estimate expected values

Expected values given DBH

Calibration plot: observed vs predicted

Estimate prediction interval

Confidence vs Prediction interval

Make predictions for new data

Workflow

Categorical predictors (factors)

Q: Does tree height vary with sex?

Model height ~ sex

Linear model with categorical predictors

Model height ~ sex

Quiz

Let’s read the model report…

Estimated distribution of the intercept parameter

Estimated distribution of the beta parameter

Analysing differences among factor levels

Visualising the fitted model

Plot (visreg)

Plot (easystats)

Model checking

Model checking: residuals

Model checking (easystats)

Q: Does height differ among field sites?

Quiz

Plot data first

Linear model with categorical predictors

Model Height ~ site

site is a factor!

Model Height ~ site

Estimated parameter distributions

Estimated tree heights for each site

Plot estimated tree heights for each site

Analysing differences among factor levels

Analysing differences among factor levels

Presenting model results

Plot (visreg)

Plot (easystats)

Package `equatiomatic` returns model structure

Generating table with model results: `modelsummary`

Generating table with model results: `modelsummary`

Plot model: `visreg`

Plot model parameters with easystats (`see` package)

Model checking: `plot(model)`

Model checking with `performance` (easystats)