library("equatiomatic")
<- lm(height ~ dbh, data = trees)
m1 ::extract_eq(m1) equatiomatic
Linear models
A simple linear model
Example dataset: forest trees
- Download this dataset
- Import:
Questions
What is the relationship between DBH and height?
Do taller trees have bigger trunks?
Can we predict height from DBH? How well?
Plot your data first!
Exploratory Data Analysis (EDA)
Outliers
Histogram of response variable
Histogram of predictor variable
Scatterplot
Model fitting
Now fit model
Hint: lm
which corresponds to
\[ \begin{aligned} Height_{i} = a + b \cdot DBH_{i} + \varepsilon _{i} \\ \varepsilon _{i}\sim N\left( 0,\sigma^2 \right) \\ \end{aligned} \]
Package equatiomatic
returns model structure
::extract_eq(m1, use_coefs = TRUE) equatiomatic
Model interpretation
What does this mean?
Estimated distribution of the intercept parameter
Estimated distribution of the slope parameter
Distribution of residuals
Degrees of freedom
DF = n - p
n = sample size
p = number of estimated parameters
R-squared
Proportion of ‘explained’ variance
\(R^{2} = 1 - \frac{Residual Variation}{Total Variation}\)
Adjusted R-squared
Accounts for model complexity
(number of parameters)
\(R^2_{adj} = 1 - (1 - R^2) \frac{n - 1}{n - p - 1}\)
Quiz
https://pollev.com/franciscorod726
Retrieving model coefficients
Confidence intervals for parameters
Retrieving model parameters (easystats)
Communicating results
Avoid dichotomania of statistical significance
“Never conclude there is ‘no difference’ or ‘no association’ just because p > 0.05 or CI includes zero”
Estimate and communicate effect sizes and their uncertainty
https://doi.org/10.1038/d41586-019-00857-9
Communicating results
We found a significant relationship between DBH and Height (p<0.05).
We found a {significant} positive relationship between DBH and Height {(p<0.05)} (b = 0.61, SE = 0.01).
(add p-value if you wish)
Models that describe themselves (easystats)
Generating table with model results: modelsummary
Generating table with model results: modelsummary
Visualising fitted model
Plot model: visreg
visreg
can use ggplot2 too
visreg(m1, gg = TRUE) + theme_bw()
Plot (easystats)
Plot (modelsummary)
Plot model parameters with easystats (see
package)
Plot parameters’ estimated distribution
Model checking
Linear model assumptions
Linearity (transformations, GAM…)
Residuals:
- Independent
- Equal variance
- Normal
Negligible measurement error in predictors
Are residuals normal?
SD = 4.09
Model checking: plot(model)
Model checking with performance
(easystats)
https://easystats.github.io/performance/articles/check_model.html
A dashboard to explore the full model
model_dashboard(m1)
Making predictions with easystats
Estimate expected values
Expected values given DBH
Calibration plot: observed vs predicted
Estimate prediction interval
Accounting for residual variation!
Confidence vs Prediction interval
Make predictions for new data
Workflow
Visualise data
Understand fitted model (
summary
)Visualise model (
visreg
…)Check model (
plot
,check_model
, calibration plot…)Predict (
predict
,estimate_expectation
,estimate_prediction
)
Categorical predictors (factors)
Q: Does tree height vary with sex?
Model height ~ sex
Linear model with categorical predictors
corresponds to
\[ \begin{aligned} Height_{i} = a + b_{male} + \varepsilon _{i} \\ \varepsilon _{i}\sim N\left( 0,\sigma^2 \right) \\ \end{aligned} \]
Model height ~ sex
Quiz
https://pollev.com/franciscorod726
Let’s read the model report…
Estimated distribution of the intercept parameter
Intercept = Height of females
Estimated distribution of the beta parameter
beta = height difference of males vs females
Analysing differences among factor levels
Visualising the fitted model
Plot (visreg)
Plot (easystats)
Model checking
Model checking: residuals
Model checking (easystats)
Q: Does height differ among field sites?
Quiz
https://pollev.com/franciscorod726
Plot data first
Linear model with categorical predictors
\[ \begin{aligned} y_{i} = a + b_{site2} + c_{site3} + d_{site4} + e_{site5} +...+ \varepsilon _{i} \\ \varepsilon _{i}\sim N\left( 0,\sigma^2 \right) \\ \end{aligned} \]
Model Height ~ site
All right here?
site is a factor!
Model Height ~ site
Estimated parameter distributions
Estimated tree heights for each site
Plot estimated tree heights for each site
Analysing differences among factor levels
For finer control see emmeans
package
Analysing differences among factor levels
How different are site 2 and site 9?
Presenting model results
Plot (visreg)
Plot (easystats)
Plot model (modelsummary)
Plot model (easystats)
Fit model without intercept
Model checking: residuals
Model checking: residuals
Combining continuous and categorical predictors
Predicting tree height based on dbh and site
corresponds to
\[ \begin{aligned} y_{i} = a + b_{site2} + c_{site3} + d_{site4} + e_{site5} +...+ k \cdot DBH_{i} + \varepsilon _{i} \\ \varepsilon _{i}\sim N\left( 0,\sigma^2 \right) \\ \end{aligned} \]
Predicting tree height based on dbh and site
Presenting model results
Estimated tree heights for each site
Fit model without intercept
Plot (visreg)
Plot model (easystats)
Keeping sites only, dropping “dbh”
Plot model (modelsummary)
Keeping sites only, dropping “dbh”
What happened to site 8?
site 8 has the largest diameters:
DBH
HEIGHT
We have fitted model w/ many intercepts and single slope
Slope is the same for all sites
Model checking: residuals
Model checking with easystats
How good is this model? Calibration plot
How good is this model? Calibration plot (easystats)
Posterior predictive checking
Simulating response data from fitted model (yrep
)
and comparing with observed response (y
)
Predicting heights of new trees (easystats)
Using model for prediction
Expected height of 10-cm diameter tree in each site?
Using model for prediction
Expected height of 10-cm DBH trees at each site
Using model for prediction
Prediction intervals (accounting for residual variance)
Q: Does allometric relationship between Height and Diameter vary among sites?
Model with interactions
Does slope vary among sites?
END