Linear models

Author

Francisco Rodríguez-Sánchez

A simple linear model

Example dataset: forest trees


    
  • Import:

    

Questions

  • What is the relationship between DBH and height?

  • Do taller trees have bigger trunks?

  • Can we predict height from DBH? How well?

Plot your data first!

Exploratory Data Analysis (EDA)

Outliers


    

Histogram of response variable


    

Histogram of predictor variable


    

Scatterplot


    

Model fitting

Now fit model

Hint: lm


    

which corresponds to

\[ \begin{aligned} Height_{i} = a + b \cdot DBH_{i} + \varepsilon _{i} \\ \varepsilon _{i}\sim N\left( 0,\sigma^2 \right) \\ \end{aligned} \]

Package equatiomatic returns model structure

library("equatiomatic")
m1 <- lm(height ~ dbh, data = trees)
equatiomatic::extract_eq(m1)
equatiomatic::extract_eq(m1, use_coefs = TRUE)

Model interpretation

What does this mean?


    

Estimated distribution of the intercept parameter


    

    

    

Estimated distribution of the slope parameter


    

    

Distribution of residuals


    

Degrees of freedom

DF = n - p

n = sample size

p = number of estimated parameters

R-squared

Proportion of ‘explained’ variance

\(R^{2} = 1 - \frac{Residual Variation}{Total Variation}\)

Adjusted R-squared

Accounts for model complexity

(number of parameters)

\(R^2_{adj} = 1 - (1 - R^2) \frac{n - 1}{n - p - 1}\)

Quiz

https://pollev.com/franciscorod726

Retrieving model coefficients


    

Confidence intervals for parameters


    

Retrieving model parameters (easystats)


    

https://easystats.github.io/parameters/

Communicating results

Avoid dichotomania of statistical significance

  • “Never conclude there is ‘no difference’ or ‘no association’ just because p > 0.05 or CI includes zero

  • Estimate and communicate effect sizes and their uncertainty

  • https://doi.org/10.1038/d41586-019-00857-9

Communicating results

  • We found a significant relationship between DBH and Height (p<0.05).

  • We found a {significant} positive relationship between DBH and Height {(p<0.05)} (b = 0.61, SE = 0.01).

  • (add p-value if you wish)

Models that describe themselves (easystats)


    

https://easystats.github.io/report/

Generating table with model results: modelsummary


    

    

https://modelsummary.com/

Generating table with model results: modelsummary


    

Visualising fitted model

Plot model: visreg


    

    

visreg can use ggplot2 too

visreg(m1, gg = TRUE) + theme_bw()

https://pbreheny.github.io/visreg

Plot (easystats)


    

Plot (modelsummary)


    

Plot model parameters with easystats (see package)


    

Plot parameters’ estimated distribution


    

Model checking

Linear model assumptions

  • Linearity (transformations, GAM…)

  • Residuals:

    • Independent
    • Equal variance
    • Normal
  • Negligible measurement error in predictors

Are residuals normal?


    

SD = 4.09

Model checking: plot(model)


    

Model checking with performance (easystats)


    

https://easystats.github.io/performance/articles/check_model.html

A dashboard to explore the full model

model_dashboard(m1)

Making predictions with easystats

Estimate expected values


    

Expected values given DBH


    

Calibration plot: observed vs predicted


    

Estimate prediction interval

Accounting for residual variation!


    

Confidence vs Prediction interval


    

    

Make predictions for new data


    

    

Workflow

  • Visualise data

  • Understand fitted model (summary)

  • Visualise model (visreg…)

  • Check model (plot, check_model, calibration plot…)

  • Predict (predict, estimate_expectation, estimate_prediction)

Categorical predictors (factors)

Q: Does tree height vary with sex?


    

Model height ~ sex


    

Linear model with categorical predictors


    

corresponds to

\[ \begin{aligned} Height_{i} = a + b_{male} + \varepsilon _{i} \\ \varepsilon _{i}\sim N\left( 0,\sigma^2 \right) \\ \end{aligned} \]

Model height ~ sex


    

Quiz

https://pollev.com/franciscorod726

Let’s read the model report…


    

Estimated distribution of the intercept parameter

Intercept = Height of females


    

    

Estimated distribution of the beta parameter

beta = height difference of males vs females


    

    

Analysing differences among factor levels


    

    

Visualising the fitted model

Plot (visreg)


    

Plot (easystats)


    

Model checking

Model checking: residuals


    

    

Model checking (easystats)


    

Q: Does height differ among field sites?

Quiz

https://pollev.com/franciscorod726

Plot data first


    

Linear model with categorical predictors


    

\[ \begin{aligned} y_{i} = a + b_{site2} + c_{site3} + d_{site4} + e_{site5} +...+ \varepsilon _{i} \\ \varepsilon _{i}\sim N\left( 0,\sigma^2 \right) \\ \end{aligned} \]

Model Height ~ site

All right here?


    

site is a factor!


    

Model Height ~ site


    

Estimated parameter distributions


    

Estimated tree heights for each site


    

Plot estimated tree heights for each site


    

Analysing differences among factor levels

For finer control see emmeans package


    

Analysing differences among factor levels

How different are site 2 and site 9?


    

    

Presenting model results


    

    

Plot (visreg)


    

Plot (easystats)


    

Plot model (modelsummary)


    

Plot model (easystats)


    

Fit model without intercept


    

    

Model checking: residuals


    

Model checking: residuals


    

Combining continuous and categorical predictors

Predicting tree height based on dbh and site


    

corresponds to

\[ \begin{aligned} y_{i} = a + b_{site2} + c_{site3} + d_{site4} + e_{site5} +...+ k \cdot DBH_{i} + \varepsilon _{i} \\ \varepsilon _{i}\sim N\left( 0,\sigma^2 \right) \\ \end{aligned} \]

Predicting tree height based on dbh and site


    

Presenting model results


    

Estimated tree heights for each site


    

Fit model without intercept


    

Plot (visreg)


    

    

Plot model (easystats)


    

Keeping sites only, dropping “dbh”


    

Plot model (modelsummary)


    

Keeping sites only, dropping “dbh”


    

What happened to site 8?


    

    

site 8 has the largest diameters:


    

DBH


    

HEIGHT


    

We have fitted model w/ many intercepts and single slope


    

Slope is the same for all sites


    

Model checking: residuals


    

Model checking with easystats


    

How good is this model? Calibration plot


    

How good is this model? Calibration plot (easystats)


    

Posterior predictive checking

Simulating response data from fitted model (yrep)

and comparing with observed response (y)


    

Predicting heights of new trees (easystats)

Using model for prediction

Expected height of 10-cm diameter tree in each site?


    

Using model for prediction

Expected height of 10-cm DBH trees at each site


    

Using model for prediction

Prediction intervals (accounting for residual variance)


    

Q: Does allometric relationship between Height and Diameter vary among sites?


    

Model with interactions


    

Does slope vary among sites?


    

    

END