download.file("https://raw.githubusercontent.com/Pakillo/LM-GLM-GLMM-intro/trees/data/trees.csv",
destfile = "trees.csv", mode = "wb")
Linear models
A simple linear model
Example dataset: forest trees
- Download this dataset
- Import:
<- read.csv("trees.csv")
trees head(trees)
site dbh height sex dead
1 4 29.68 36.1 male 0
2 5 33.29 42.3 male 0
3 2 28.03 41.9 female 0
4 5 39.86 46.5 female 0
5 1 47.94 43.9 female 0
6 1 10.82 26.2 male 0
Questions
What is the relationship between DBH and height?
Do taller trees have bigger trunks?
Can we predict height from DBH? How well?
Plot your data first!
Exploratory Data Analysis (EDA)
Outliers
plot(trees$height)
Histogram of response variable
hist(trees$height)
Histogram of predictor variable
hist(trees$dbh)
Scatterplot
plot(height ~ dbh, data = trees, las = 1)
Model fitting
Now fit model
Hint: lm
<- lm(height ~ dbh, data = trees) m1
which corresponds to
\[ \begin{aligned} Height_{i} = a + b \cdot DBH_{i} + \varepsilon _{i} \\ \varepsilon _{i}\sim N\left( 0,\sigma^2 \right) \\ \end{aligned} \]
Package equatiomatic
returns model structure
library("equatiomatic")
<- lm(height ~ dbh, data = trees)
m1 ::extract_eq(m1) equatiomatic
::extract_eq(m1, use_coefs = TRUE) equatiomatic
Model interpretation
What does this mean?
summary(m1)
Call:
lm(formula = height ~ dbh, data = trees)
Residuals:
Min 1Q Median 3Q Max
-13.3270 -2.8978 0.1057 2.7924 12.9511
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 19.33920 0.31064 62.26 <2e-16 ***
dbh 0.61570 0.01013 60.79 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.093 on 998 degrees of freedom
Multiple R-squared: 0.7874, Adjusted R-squared: 0.7871
F-statistic: 3695 on 1 and 998 DF, p-value: < 2.2e-16
Estimated distribution of the intercept parameter
library("easystats")
# Attaching packages: easystats 0.7.0
✔ bayestestR 0.13.1 ✔ correlation 0.8.4
✔ datawizard 0.9.0 ✔ effectsize 0.8.6
✔ insight 0.19.6 ✔ modelbased 0.8.6
✔ performance 0.10.8 ✔ parameters 0.21.3
✔ report 0.5.7 ✔ see 0.8.1
parameters(m1)[1,]
Parameter | Coefficient | SE | 95% CI | t(998) | p
-------------------------------------------------------------------
(Intercept) | 19.34 | 0.31 | [18.73, 19.95] | 62.26 | < .001
Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
using a Wald t-distribution approximation.
plot(simulate_parameters(m1), show_intercept = TRUE)
Estimated distribution of the slope parameter
::parameters(m1)[2,] parameters
Parameter | Coefficient | SE | 95% CI | t(998) | p
---------------------------------------------------------------
dbh | 0.62 | 0.01 | [0.60, 0.64] | 60.79 | < .001
Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
using a Wald t-distribution approximation.
plot(simulate_parameters(m1))
Distribution of residuals
hist(residuals(m1))
Degrees of freedom
DF = n - p
n = sample size
p = number of estimated parameters
R-squared
Proportion of ‘explained’ variance
\(R^{2} = 1 - \frac{Residual Variation}{Total Variation}\)
Adjusted R-squared
Accounts for model complexity
(number of parameters)
\(R^2_{adj} = 1 - (1 - R^2) \frac{n - 1}{n - p - 1}\)
Quiz
https://pollev.com/franciscorod726
Retrieving model coefficients
coef(m1)
(Intercept) dbh
19.3391968 0.6157036
Confidence intervals for parameters
confint(m1)
2.5 % 97.5 %
(Intercept) 18.7296053 19.948788
dbh 0.5958282 0.635579
Retrieving model parameters (easystats)
parameters(m1)
Parameter | Coefficient | SE | 95% CI | t(998) | p
-------------------------------------------------------------------
(Intercept) | 19.34 | 0.31 | [18.73, 19.95] | 62.26 | < .001
dbh | 0.62 | 0.01 | [ 0.60, 0.64] | 60.79 | < .001
Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
using a Wald t-distribution approximation.
Communicating results
Avoid dichotomania of statistical significance
“Never conclude there is ‘no difference’ or ‘no association’ just because p > 0.05 or CI includes zero”
Estimate and communicate effect sizes and their uncertainty
https://doi.org/10.1038/d41586-019-00857-9
Communicating results
We found a significant relationship between DBH and Height (p<0.05).
We found a {significant} positive relationship between DBH and Height {(p<0.05)} (b = 0.61, SE = 0.01).
(add p-value if you wish)
Models that describe themselves (easystats)
report(m1)
We fitted a linear model (estimated using OLS) to predict height with dbh
(formula: height ~ dbh). The model explains a statistically significant and
substantial proportion of variance (R2 = 0.79, F(1, 998) = 3695.40, p < .001,
adj. R2 = 0.79). The model's intercept, corresponding to dbh = 0, is at 19.34
(95% CI [18.73, 19.95], t(998) = 62.26, p < .001). Within this model:
- The effect of dbh is statistically significant and positive (beta = 0.62, 95%
CI [0.60, 0.64], t(998) = 60.79, p < .001; Std. beta = 0.89, 95% CI [0.86,
0.92])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald t-distribution approximation.
Generating table with model results: modelsummary
library("modelsummary")
Attaching package: 'modelsummary'
The following object is masked from 'package:parameters':
supported_models
The following object is masked from 'package:insight':
supported_models
modelsummary(m1, output = "html") ## Word, PDF, PowerPoint, png...
(1) | |
---|---|
(Intercept) | 19.339 |
(0.311) | |
dbh | 0.616 |
(0.010) | |
Num.Obs. | 1000 |
R2 | 0.787 |
R2 Adj. | 0.787 |
AIC | 5660.3 |
BIC | 5675.0 |
Log.Lik. | −2827.125 |
F | 3695.395 |
RMSE | 4.09 |
Generating table with model results: modelsummary
modelsummary(m1, fmt = 2,
estimate = "{estimate} ({std.error})",
statistic = NULL,
gof_map = c("nobs", "r.squared", "rmse"),
output = "html")
(1) | |
---|---|
(Intercept) | 19.34 (0.31) |
dbh | 0.62 (0.01) |
Num.Obs. | 1000 |
R2 | 0.787 |
RMSE | 4.09 |
Visualising fitted model
Plot model: visreg
library("visreg")
visreg(m1)
visreg
can use ggplot2 too
visreg(m1, gg = TRUE) + theme_bw()
Plot (easystats)
plot(estimate_expectation(m1))
Plot (modelsummary)
modelplot(m1)
Plot model parameters with easystats (see
package)
plot(parameters(m1), show_intercept = TRUE, show_labels = TRUE)
Plot parameters’ estimated distribution
plot(simulate_parameters(m1))
Model checking
Linear model assumptions
Linearity (transformations, GAM…)
Residuals:
- Independent
- Equal variance
- Normal
Negligible measurement error in predictors
Are residuals normal?
hist(residuals(m1))
SD = 4.09
Model checking: plot(model)
<- par(no.readonly = TRUE)
def.par layout(matrix(1:4, nrow=2))
plot(m1)
par(def.par)
Model checking with performance
(easystats)
check_model(m1)
https://easystats.github.io/performance/articles/check_model.html
A dashboard to explore the full model
model_dashboard(m1)
Making predictions with easystats
Estimate expected values
<- estimate_expectation(m1)
pred head(pred)
Model-based Expectation
dbh | Predicted | SE | 95% CI | Residuals
-----------------------------------------------------
29.68 | 37.61 | 0.13 | [37.36, 37.87] | -1.51
33.29 | 39.84 | 0.14 | [39.56, 40.11] | 2.46
28.03 | 36.60 | 0.13 | [36.34, 36.85] | 5.30
39.86 | 43.88 | 0.18 | [43.53, 44.23] | 2.62
47.94 | 48.86 | 0.24 | [48.38, 49.33] | -4.96
10.82 | 26.00 | 0.22 | [25.58, 26.42] | 0.20
Variable predicted: height
Expected values given DBH
plot(estimate_expectation(m1))
Calibration plot: observed vs predicted
$height.obs <- trees$height
predplot(height.obs ~ Predicted, data = pred, xlim = c(15, 60), ylim = c(15, 60))
abline(a = 0, b = 1)
Estimate prediction interval
Accounting for residual variation!
<- estimate_prediction(m1)
pred head(pred)
Model-based Prediction
dbh | Predicted | SE | 95% CI | Residuals
-----------------------------------------------------
29.68 | 37.61 | 4.09 | [29.58, 45.65] | -1.51
33.29 | 39.84 | 4.10 | [31.80, 47.87] | 2.46
28.03 | 36.60 | 4.09 | [28.56, 44.63] | 5.30
39.86 | 43.88 | 4.10 | [35.84, 51.92] | 2.62
47.94 | 48.86 | 4.10 | [40.81, 56.90] | -4.96
10.82 | 26.00 | 4.10 | [17.96, 34.04] | 0.20
Variable predicted: height
Confidence vs Prediction interval
plot(estimate_expectation(m1))
plot(estimate_prediction(m1))
Make predictions for new data
estimate_expectation(m1, data = data.frame(dbh = 39))
Model-based Expectation
dbh | Predicted | SE | 95% CI
-----------------------------------------
39.00 | 43.35 | 0.17 | [43.01, 43.69]
Variable predicted: height
estimate_prediction(m1, data = data.frame(dbh = 39))
Model-based Prediction
dbh | Predicted | SE | 95% CI
-----------------------------------------
39.00 | 43.35 | 4.10 | [35.31, 51.39]
Variable predicted: height
Workflow
Visualise data
Understand fitted model (
summary
)Visualise model (
visreg
…)Check model (
plot
,check_model
, calibration plot…)Predict (
predict
,estimate_expectation
,estimate_prediction
)
Categorical predictors (factors)
Q: Does tree height vary with sex?
boxplot(height ~ sex, data = trees)
Model height ~ sex
<- lm(height ~ sex, data = trees)
m2 summary(m2)
Call:
lm(formula = height ~ sex, data = trees)
Residuals:
Min 1Q Median 3Q Max
-22.6881 -6.7881 -0.0097 6.7261 22.3687
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 36.9312 0.3981 92.778 <2e-16 ***
sexmale -0.8432 0.5607 -1.504 0.133
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.865 on 998 degrees of freedom
Multiple R-squared: 0.002261, Adjusted R-squared: 0.001261
F-statistic: 2.261 on 1 and 998 DF, p-value: 0.133
Linear model with categorical predictors
<- lm(height ~ sex, data = trees) m2
corresponds to
\[ \begin{aligned} Height_{i} = a + b_{male} + \varepsilon _{i} \\ \varepsilon _{i}\sim N\left( 0,\sigma^2 \right) \\ \end{aligned} \]
Model height ~ sex
<- lm(height ~ sex, data = trees)
m2 summary(m2)
Call:
lm(formula = height ~ sex, data = trees)
Residuals:
Min 1Q Median 3Q Max
-22.6881 -6.7881 -0.0097 6.7261 22.3687
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 36.9312 0.3981 92.778 <2e-16 ***
sexmale -0.8432 0.5607 -1.504 0.133
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.865 on 998 degrees of freedom
Multiple R-squared: 0.002261, Adjusted R-squared: 0.001261
F-statistic: 2.261 on 1 and 998 DF, p-value: 0.133
Quiz
https://pollev.com/franciscorod726
Let’s read the model report…
report(m2)
We fitted a linear model (estimated using OLS) to predict height with sex
(formula: height ~ sex). The model explains a statistically not significant and
very weak proportion of variance (R2 = 2.26e-03, F(1, 998) = 2.26, p = 0.133,
adj. R2 = 1.26e-03). The model's intercept, corresponding to sex = female, is
at 36.93 (95% CI [36.15, 37.71], t(998) = 92.78, p < .001). Within this model:
- The effect of sex [male] is statistically non-significant and negative (beta
= -0.84, 95% CI [-1.94, 0.26], t(998) = -1.50, p = 0.133; Std. beta = -0.10,
95% CI [-0.22, 0.03])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald t-distribution approximation.
Estimated distribution of the intercept parameter
Intercept = Height of females
parameters(m2)[1,]
Parameter | Coefficient | SE | 95% CI | t(998) | p
-------------------------------------------------------------------
(Intercept) | 36.93 | 0.40 | [36.15, 37.71] | 92.78 | < .001
Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
using a Wald t-distribution approximation.
plot(simulate_parameters(m2), show_intercept = TRUE)
Estimated distribution of the beta parameter
beta = height difference of males vs females
parameters(m2)[2,]
Parameter | Coefficient | SE | 95% CI | t(998) | p
----------------------------------------------------------------
sex [male] | -0.84 | 0.56 | [-1.94, 0.26] | -1.50 | 0.133
Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
using a Wald t-distribution approximation.
plot(simulate_parameters(m2))
Analysing differences among factor levels
estimate_means(m2)
We selected `at = c("sex")`.
Estimated Marginal Means
sex | Mean | SE | 95% CI
--------------------------------------
male | 36.09 | 0.39 | [35.31, 36.86]
female | 36.93 | 0.40 | [36.15, 37.71]
Marginal means estimated at sex
estimate_contrasts(m2)
No variable was specified for contrast estimation. Selecting `contrast = "sex"`.
Marginal Contrasts Analysis
Level1 | Level2 | Difference | 95% CI | SE | t(998) | p
--------------------------------------------------------------------
male | female | -0.84 | [-1.94, 0.26] | 0.56 | -1.50 | 0.133
Marginal contrasts estimated at sex
p-value adjustment method: Holm (1979)
Visualising the fitted model
Plot (visreg)
visreg(m2)
Plot (easystats)
plot(estimate_means(m2))
We selected `at = c("sex")`.
Model checking
Model checking: residuals
hist(resid(m2))
<- par(no.readonly = TRUE)
def.par layout(matrix(1:4, nrow=2))
plot(m2)
par(def.par)
Model checking (easystats)
check_model(m2)
Q: Does height differ among field sites?
Quiz
https://pollev.com/franciscorod726
Plot data first
plot(height ~ site, data = trees)
Linear model with categorical predictors
<- lm(height ~ site, data = trees) m3
\[ \begin{aligned} y_{i} = a + b_{site2} + c_{site3} + d_{site4} + e_{site5} +...+ \varepsilon _{i} \\ \varepsilon _{i}\sim N\left( 0,\sigma^2 \right) \\ \end{aligned} \]
Model Height ~ site
All right here?
<- lm(height ~ site, data = trees)
m3 summary(m3)
Call:
lm(formula = height ~ site, data = trees)
Residuals:
Min 1Q Median 3Q Max
-22.4498 -6.7049 0.0709 6.7537 23.0640
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 35.4636 0.4730 74.975 < 2e-16 ***
site 0.3862 0.1413 2.733 0.00639 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.842 on 998 degrees of freedom
Multiple R-squared: 0.007429, Adjusted R-squared: 0.006435
F-statistic: 7.47 on 1 and 998 DF, p-value: 0.006385
site is a factor!
$site <- as.factor(trees$site) trees
Model Height ~ site
<- lm(height ~ site, data = trees)
m3 summary(m3)
Call:
lm(formula = height ~ site, data = trees)
Residuals:
Min 1Q Median 3Q Max
-20.4416 -6.9004 0.0379 6.3051 19.7584
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 33.8416 0.4266 79.329 < 2e-16 ***
site2 6.3411 0.7126 8.899 < 2e-16 ***
site3 4.9991 0.9828 5.086 4.36e-07 ***
site4 0.5329 0.9872 0.540 0.58949
site5 4.3723 0.9425 4.639 3.97e-06 ***
site6 4.7601 1.1709 4.065 5.18e-05 ***
site7 -0.7416 1.8506 -0.401 0.68871
site8 -0.6832 2.4753 -0.276 0.78258
site9 9.1709 3.0165 3.040 0.00243 **
site10 -0.5816 3.8013 -0.153 0.87843
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.446 on 990 degrees of freedom
Multiple R-squared: 0.1016, Adjusted R-squared: 0.09344
F-statistic: 12.44 on 9 and 990 DF, p-value: < 2.2e-16
Estimated parameter distributions
plot(simulate_parameters(m3))
Estimated tree heights for each site
estimate_means(m3)
We selected `at = c("site")`.
Estimated Marginal Means
site | Mean | SE | 95% CI
------------------------------------
1 | 33.84 | 0.43 | [33.00, 34.68]
2 | 40.18 | 0.57 | [39.06, 41.30]
3 | 38.84 | 0.89 | [37.10, 40.58]
4 | 34.37 | 0.89 | [32.63, 36.12]
5 | 38.21 | 0.84 | [36.56, 39.86]
6 | 38.60 | 1.09 | [36.46, 40.74]
7 | 33.10 | 1.80 | [29.57, 36.63]
8 | 33.16 | 2.44 | [28.37, 37.94]
9 | 43.01 | 2.99 | [37.15, 48.87]
10 | 33.26 | 3.78 | [25.85, 40.67]
Marginal means estimated at site
Plot estimated tree heights for each site
plot(estimate_means(m3))
We selected `at = c("site")`.
Analysing differences among factor levels
For finer control see emmeans
package
estimate_contrasts(m3)
No variable was specified for contrast estimation. Selecting `contrast = "site"`.
Marginal Contrasts Analysis
Level1 | Level2 | Difference | 95% CI | SE | t(990) | p
-----------------------------------------------------------------------
site1 | site10 | 0.58 | [-11.85, 13.01] | 3.80 | 0.15 | > .999
site1 | site2 | -6.34 | [ -8.67, -4.01] | 0.71 | -8.90 | < .001
site1 | site3 | -5.00 | [ -8.21, -1.78] | 0.98 | -5.09 | < .001
site1 | site4 | -0.53 | [ -3.76, 2.70] | 0.99 | -0.54 | > .999
site1 | site5 | -4.37 | [ -7.45, -1.29] | 0.94 | -4.64 | < .001
site1 | site6 | -4.76 | [ -8.59, -0.93] | 1.17 | -4.07 | 0.002
site1 | site7 | 0.74 | [ -5.31, 6.79] | 1.85 | 0.40 | > .999
site1 | site8 | 0.68 | [ -7.41, 8.78] | 2.48 | 0.28 | > .999
site1 | site9 | -9.17 | [-19.04, 0.69] | 3.02 | -3.04 | 0.090
site2 | site10 | 6.92 | [ -5.57, 19.42] | 3.82 | 1.81 | > .999
site2 | site3 | 1.34 | [ -2.10, 4.79] | 1.05 | 1.27 | > .999
site2 | site4 | 5.81 | [ 2.35, 9.27] | 1.06 | 5.49 | < .001
site2 | site5 | 1.97 | [ -1.35, 5.29] | 1.02 | 1.94 | > .999
site2 | site6 | 1.58 | [ -2.44, 5.61] | 1.23 | 1.28 | > .999
site2 | site7 | 7.08 | [ 0.90, 13.26] | 1.89 | 3.75 | 0.008
site2 | site8 | 7.02 | [ -1.17, 15.21] | 2.50 | 2.81 | 0.169
site2 | site9 | -2.83 | [-12.77, 7.11] | 3.04 | -0.93 | > .999
site3 | site10 | 5.58 | [ -7.11, 18.27] | 3.88 | 1.44 | > .999
site3 | site4 | 4.47 | [ 0.36, 8.57] | 1.26 | 3.56 | 0.015
site3 | site5 | 0.63 | [ -3.37, 4.62] | 1.22 | 0.51 | > .999
site3 | site6 | 0.24 | [ -4.35, 4.83] | 1.40 | 0.17 | > .999
site3 | site7 | 5.74 | [ -0.82, 12.30] | 2.01 | 2.86 | 0.151
site3 | site8 | 5.68 | [ -2.80, 14.17] | 2.59 | 2.19 | 0.804
site3 | site9 | -4.17 | [-14.36, 6.01] | 3.11 | -1.34 | > .999
site4 | site10 | 1.11 | [-11.58, 13.81] | 3.88 | 0.29 | > .999
site4 | site5 | -3.84 | [ -7.84, 0.16] | 1.22 | -3.14 | 0.067
site4 | site6 | -4.23 | [ -8.83, 0.38] | 1.41 | -3.00 | 0.099
site4 | site7 | 1.27 | [ -5.30, 7.84] | 2.01 | 0.63 | > .999
site4 | site8 | 1.22 | [ -7.27, 9.70] | 2.60 | 0.47 | > .999
site4 | site9 | -8.64 | [-18.83, 1.55] | 3.12 | -2.77 | 0.182
site5 | site10 | 4.95 | [ -7.70, 17.61] | 3.87 | 1.28 | > .999
site5 | site6 | -0.39 | [ -4.89, 4.11] | 1.38 | -0.28 | > .999
site5 | site7 | 5.11 | [ -1.39, 11.61] | 1.99 | 2.57 | 0.306
site5 | site8 | 5.06 | [ -3.38, 13.49] | 2.58 | 1.96 | > .999
site5 | site9 | -4.80 | [-14.94, 5.35] | 3.10 | -1.55 | > .999
site6 | site10 | 5.34 | [ -7.52, 18.20] | 3.93 | 1.36 | > .999
site6 | site7 | 5.50 | [ -1.38, 12.39] | 2.11 | 2.61 | 0.282
site6 | site8 | 5.44 | [ -3.29, 14.18] | 2.67 | 2.04 | > .999
site6 | site9 | -4.41 | [-14.81, 5.99] | 3.18 | -1.39 | > .999
site7 | site10 | -0.16 | [-13.85, 13.53] | 4.18 | -0.04 | > .999
site7 | site8 | -0.06 | [ -9.97, 9.85] | 3.03 | -0.02 | > .999
site7 | site9 | -9.91 | [-21.32, 1.49] | 3.49 | -2.84 | 0.155
site8 | site10 | -0.10 | [-14.80, 14.60] | 4.50 | -0.02 | > .999
site8 | site9 | -9.85 | [-22.46, 2.75] | 3.86 | -2.56 | 0.311
site9 | site10 | 9.75 | [ -5.99, 25.50] | 4.82 | 2.03 | > .999
Marginal contrasts estimated at site
p-value adjustment method: Holm (1979)
Analysing differences among factor levels
How different are site 2 and site 9?
library("marginaleffects")
hypotheses(m3, "site2 = site9")
Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
site2 = site9 -2.83 3.04 -0.931 0.352 1.5 -8.79 3.13
Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
Presenting model results
parameters(m3)
Parameter | Coefficient | SE | 95% CI | t(990) | p
-------------------------------------------------------------------
(Intercept) | 33.84 | 0.43 | [33.00, 34.68] | 79.33 | < .001
site [2] | 6.34 | 0.71 | [ 4.94, 7.74] | 8.90 | < .001
site [3] | 5.00 | 0.98 | [ 3.07, 6.93] | 5.09 | < .001
site [4] | 0.53 | 0.99 | [-1.40, 2.47] | 0.54 | 0.589
site [5] | 4.37 | 0.94 | [ 2.52, 6.22] | 4.64 | < .001
site [6] | 4.76 | 1.17 | [ 2.46, 7.06] | 4.07 | < .001
site [7] | -0.74 | 1.85 | [-4.37, 2.89] | -0.40 | 0.689
site [8] | -0.68 | 2.48 | [-5.54, 4.17] | -0.28 | 0.783
site [9] | 9.17 | 3.02 | [ 3.25, 15.09] | 3.04 | 0.002
site [10] | -0.58 | 3.80 | [-8.04, 6.88] | -0.15 | 0.878
Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
using a Wald t-distribution approximation.
modelsummary(m3, estimate = "{estimate} ({std.error})", statistic = NULL,
fmt = 1, gof_map = NA, coef_rename = paste0("site", 1:10), output = "html")
(1) | |
---|---|
site1 | 33.8 (0.4) |
site2 | 6.3 (0.7) |
site3 | 5.0 (1.0) |
site4 | 0.5 (1.0) |
site5 | 4.4 (0.9) |
site6 | 4.8 (1.2) |
site7 | −0.7 (1.9) |
site8 | −0.7 (2.5) |
site9 | 9.2 (3.0) |
site10 | −0.6 (3.8) |
Plot (visreg)
visreg(m3)
Plot (easystats)
plot(estimate_means(m3))
We selected `at = c("site")`.
Plot model (modelsummary)
modelplot(m3)
Plot model (easystats)
plot(parameters(m3), show_intercept = TRUE)
Fit model without intercept
<- lm(height ~ site - 1, data = trees)
m3bis summary(m3bis)
Call:
lm(formula = height ~ site - 1, data = trees)
Residuals:
Min 1Q Median 3Q Max
-20.4416 -6.9004 0.0379 6.3051 19.7584
Coefficients:
Estimate Std. Error t value Pr(>|t|)
site1 33.8416 0.4266 79.329 <2e-16 ***
site2 40.1826 0.5707 70.404 <2e-16 ***
site3 38.8407 0.8854 43.868 <2e-16 ***
site4 34.3744 0.8903 38.610 <2e-16 ***
site5 38.2139 0.8404 45.469 <2e-16 ***
site6 38.6017 1.0904 35.401 <2e-16 ***
site7 33.1000 1.8007 18.381 <2e-16 ***
site8 33.1583 2.4382 13.599 <2e-16 ***
site9 43.0125 2.9862 14.404 <2e-16 ***
site10 33.2600 3.7773 8.805 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.446 on 990 degrees of freedom
Multiple R-squared: 0.95, Adjusted R-squared: 0.9495
F-statistic: 1879 on 10 and 990 DF, p-value: < 2.2e-16
plot(parameters(m3bis))
Model checking: residuals
<- par(no.readonly = TRUE)
def.par layout(matrix(1:4, nrow = 2))
plot(m3)
par(def.par)
Model checking: residuals
check_model(m3)
Combining continuous and categorical predictors
Predicting tree height based on dbh and site
lm(height ~ site + dbh, data = trees)
Call:
lm(formula = height ~ site + dbh, data = trees)
Coefficients:
(Intercept) site2 site3 site4 site5 site6
16.6990 6.5043 4.3575 1.9347 3.6374 4.2045
site7 site8 site9 site10 dbh
-0.1762 -5.3126 5.4370 2.2633 0.6171
corresponds to
\[ \begin{aligned} y_{i} = a + b_{site2} + c_{site3} + d_{site4} + e_{site5} +...+ k \cdot DBH_{i} + \varepsilon _{i} \\ \varepsilon _{i}\sim N\left( 0,\sigma^2 \right) \\ \end{aligned} \]
Predicting tree height based on dbh and site
<- lm(height ~ site + dbh, data = trees)
m4 summary(m4)
Call:
lm(formula = height ~ site + dbh, data = trees)
Residuals:
Min 1Q Median 3Q Max
-10.1130 -1.9885 0.0582 2.0314 11.3320
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16.699037 0.260565 64.088 < 2e-16 ***
site2 6.504303 0.256730 25.335 < 2e-16 ***
site3 4.357457 0.354181 12.303 < 2e-16 ***
site4 1.934650 0.356102 5.433 6.98e-08 ***
site5 3.637432 0.339688 10.708 < 2e-16 ***
site6 4.204511 0.421906 9.966 < 2e-16 ***
site7 -0.176193 0.666772 -0.264 0.7916
site8 -5.312648 0.893603 -5.945 3.82e-09 ***
site9 5.437049 1.087766 4.998 6.84e-07 ***
site10 2.263338 1.369986 1.652 0.0988 .
dbh 0.617075 0.007574 81.473 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.043 on 989 degrees of freedom
Multiple R-squared: 0.8835, Adjusted R-squared: 0.8823
F-statistic: 750 on 10 and 989 DF, p-value: < 2.2e-16
Presenting model results
parameters(m4)
Parameter | Coefficient | SE | 95% CI | t(989) | p
-----------------------------------------------------------------------
(Intercept) | 16.70 | 0.26 | [16.19, 17.21] | 64.09 | < .001
site [2] | 6.50 | 0.26 | [ 6.00, 7.01] | 25.34 | < .001
site [3] | 4.36 | 0.35 | [ 3.66, 5.05] | 12.30 | < .001
site [4] | 1.93 | 0.36 | [ 1.24, 2.63] | 5.43 | < .001
site [5] | 3.64 | 0.34 | [ 2.97, 4.30] | 10.71 | < .001
site [6] | 4.20 | 0.42 | [ 3.38, 5.03] | 9.97 | < .001
site [7] | -0.18 | 0.67 | [-1.48, 1.13] | -0.26 | 0.792
site [8] | -5.31 | 0.89 | [-7.07, -3.56] | -5.95 | < .001
site [9] | 5.44 | 1.09 | [ 3.30, 7.57] | 5.00 | < .001
site [10] | 2.26 | 1.37 | [-0.43, 4.95] | 1.65 | 0.099
dbh | 0.62 | 7.57e-03 | [ 0.60, 0.63] | 81.47 | < .001
Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
using a Wald t-distribution approximation.
Estimated tree heights for each site
estimate_means(m4)
We selected `at = c("site")`.
Estimated Marginal Means
site | Mean | SE | 95% CI
------------------------------------
1 | 33.90 | 0.15 | [33.60, 34.21]
2 | 40.41 | 0.21 | [40.01, 40.81]
3 | 38.26 | 0.32 | [37.64, 38.89]
4 | 35.84 | 0.32 | [35.21, 36.47]
5 | 37.54 | 0.30 | [36.95, 38.14]
6 | 38.11 | 0.39 | [37.34, 38.88]
7 | 33.73 | 0.65 | [32.45, 35.00]
8 | 28.59 | 0.88 | [26.86, 30.32]
9 | 39.34 | 1.08 | [37.23, 41.45]
10 | 36.17 | 1.36 | [33.50, 38.84]
Marginal means estimated at site
Fit model without intercept
<- lm(height ~ -1 + site + dbh, data = trees)
m4 summary(m4)
Call:
lm(formula = height ~ -1 + site + dbh, data = trees)
Residuals:
Min 1Q Median 3Q Max
-10.1130 -1.9885 0.0582 2.0314 11.3320
Coefficients:
Estimate Std. Error t value Pr(>|t|)
site1 16.699037 0.260565 64.09 <2e-16 ***
site2 23.203340 0.292773 79.25 <2e-16 ***
site3 21.056494 0.386532 54.48 <2e-16 ***
site4 18.633687 0.374456 49.76 <2e-16 ***
site5 20.336469 0.373942 54.38 <2e-16 ***
site6 20.903548 0.448913 46.56 <2e-16 ***
site7 16.522844 0.679936 24.30 <2e-16 ***
site8 11.386389 0.918198 12.40 <2e-16 ***
site9 22.136086 1.105970 20.02 <2e-16 ***
site10 18.962375 1.372158 13.82 <2e-16 ***
dbh 0.617075 0.007574 81.47 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.043 on 989 degrees of freedom
Multiple R-squared: 0.9935, Adjusted R-squared: 0.9934
F-statistic: 1.377e+04 on 11 and 989 DF, p-value: < 2.2e-16
Plot (visreg)
visreg(m4)
visreg(m4, xvar = "dbh", by = "site", overlay = TRUE, band = FALSE)
Plot model (easystats)
plot(parameters(m4))
Keeping sites only, dropping “dbh”
plot(parameters(m4, drop = "dbh"))
Plot model (modelsummary)
modelplot(m4)
Keeping sites only, dropping “dbh”
modelplot(m4, coef_omit = "dbh")
What happened to site 8?
visreg(m3)
visreg(m4, xvar = "site")
site 8 has the largest diameters:
boxplot(dbh ~ site, data = trees)
DBH
aggregate(trees$dbh ~ trees$site, FUN = mean)
trees$site trees$dbh
1 1 27.78033
2 2 27.51580
3 3 28.82011
4 4 25.50867
5 5 28.97119
6 6 28.68067
7 7 26.86409
8 8 35.28250
9 9 33.83125
10 10 23.17000
HEIGHT
aggregate(trees$height ~ trees$site, FUN = mean)
trees$site trees$height
1 1 33.84158
2 2 40.18265
3 3 38.84066
4 4 34.37444
5 5 38.21386
6 6 38.60167
7 7 33.10000
8 8 33.15833
9 9 43.01250
10 10 33.26000
We have fitted model w/ many intercepts and single slope
visreg(m4, xvar = "dbh", by = "site", overlay = TRUE, band = FALSE)
Slope is the same for all sites
parameters(m4, keep = "dbh")
Parameter | Coefficient | SE | 95% CI | t(989) | p
-------------------------------------------------------------------
dbh | 0.62 | 7.57e-03 | [0.60, 0.63] | 81.47 | < .001
Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
using a Wald t-distribution approximation.
Model checking: residuals
<- par(no.readonly = TRUE)
def.par layout(matrix(1:4, nrow=2))
plot(m4)
par(def.par)
Model checking with easystats
check_model(m4)
How good is this model? Calibration plot
$height.pred <- fitted(m4)
treesplot(trees$height.pred, trees$height,
xlab = "Tree height (predicted)",
ylab = "Tree height (observed)",
las = 1, xlim = c(10,60), ylim = c(10,60))
abline(a = 0, b = 1)
How good is this model? Calibration plot (easystats)
<- estimate_expectation(m4)
pred $obs <- trees$height
predplot(obs ~ Predicted, data = pred, xlim = c(15, 60), ylim = c(15, 60))
abline(a = 0, b = 1)
Posterior predictive checking
Simulating response data from fitted model (yrep
)
and comparing with observed response (y
)
check_predictions(m4)
Predicting heights of new trees (easystats)
Using model for prediction
Expected height of 10-cm diameter tree in each site?
.10cm <- data.frame(site = as.factor(1:10),
treesdbh = 10)
.10cm trees
site dbh
1 1 10
2 2 10
3 3 10
4 4 10
5 5 10
6 6 10
7 7 10
8 8 10
9 9 10
10 10 10
Using model for prediction
Expected height of 10-cm DBH trees at each site
<- estimate_expectation(m4, data = trees.10cm)
pred pred
Model-based Expectation
site | dbh | Predicted | SE | 95% CI
------------------------------------------------
1 | 10.00 | 22.87 | 0.20 | [22.47, 23.27]
2 | 10.00 | 29.37 | 0.24 | [28.89, 29.85]
3 | 10.00 | 27.23 | 0.35 | [26.54, 27.91]
4 | 10.00 | 24.80 | 0.34 | [24.13, 25.47]
5 | 10.00 | 26.51 | 0.34 | [25.85, 27.16]
6 | 10.00 | 27.07 | 0.42 | [26.25, 27.89]
7 | 10.00 | 22.69 | 0.66 | [21.40, 23.99]
8 | 10.00 | 17.56 | 0.90 | [15.79, 19.32]
9 | 10.00 | 28.31 | 1.09 | [26.17, 30.45]
10 | 10.00 | 25.13 | 1.36 | [22.46, 27.81]
Variable predicted: height
Using model for prediction
Prediction intervals (accounting for residual variance)
<- estimate_prediction(m4, data = trees.10cm)
pred pred
Model-based Prediction
site | dbh | Predicted | SE | 95% CI
------------------------------------------------
1 | 10.00 | 22.87 | 3.05 | [16.88, 28.85]
2 | 10.00 | 29.37 | 3.05 | [23.38, 35.36]
3 | 10.00 | 27.23 | 3.06 | [21.22, 33.24]
4 | 10.00 | 24.80 | 3.06 | [18.80, 30.81]
5 | 10.00 | 26.51 | 3.06 | [20.50, 32.51]
6 | 10.00 | 27.07 | 3.07 | [21.05, 33.10]
7 | 10.00 | 22.69 | 3.11 | [16.58, 28.80]
8 | 10.00 | 17.56 | 3.17 | [11.33, 23.78]
9 | 10.00 | 28.31 | 3.23 | [21.96, 34.65]
10 | 10.00 | 25.13 | 3.33 | [18.59, 31.68]
Variable predicted: height
Q: Does allometric relationship between Height and Diameter vary among sites?
<- data.frame(dbh = seq(10, 50, by = 1),
df height = seq(20, 60, by = 1))
plot(height ~ dbh, data = df, type = "n")
abline(a = 25, 0.6)
abline(a = 40, b = 0.1, col = "steelblue")
abline(a = 50, b = -0.3, col = "orangered")
Model with interactions
<- lm(height ~ site*dbh, data = trees)
m5 summary(m5)
Call:
lm(formula = height ~ site * dbh, data = trees)
Residuals:
Min 1Q Median 3Q Max
-10.1017 -1.9839 0.0645 2.0486 11.1789
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16.359437 0.360054 45.436 < 2e-16 ***
site2 7.684781 0.609657 12.605 < 2e-16 ***
site3 4.518568 0.867008 5.212 2.28e-07 ***
site4 2.769336 0.813259 3.405 0.000688 ***
site5 3.917607 0.870983 4.498 7.68e-06 ***
site6 4.155161 1.009379 4.117 4.17e-05 ***
site7 -2.306799 1.551303 -1.487 0.137334
site8 -2.616095 4.090671 -0.640 0.522630
site9 2.621560 5.073794 0.517 0.605492
site10 4.662340 2.991072 1.559 0.119378
dbh 0.629299 0.011722 53.685 < 2e-16 ***
site2:dbh -0.042784 0.020033 -2.136 0.032950 *
site3:dbh -0.006031 0.027640 -0.218 0.827312
site4:dbh -0.031633 0.028225 -1.121 0.262677
site5:dbh -0.010173 0.027887 -0.365 0.715334
site6:dbh 0.001337 0.032109 0.042 0.966797
site7:dbh 0.079728 0.052056 1.532 0.125951
site8:dbh -0.079027 0.113386 -0.697 0.485984
site9:dbh 0.081035 0.146649 0.553 0.580679
site10:dbh -0.101107 0.114520 -0.883 0.377522
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.041 on 980 degrees of freedom
Multiple R-squared: 0.8847, Adjusted R-squared: 0.8825
F-statistic: 395.7 on 19 and 980 DF, p-value: < 2.2e-16
Does slope vary among sites?
visreg(m5, xvar = "dbh", by = "site")
visreg(m5, xvar = "dbh", by = "site", overlay = TRUE, band = FALSE)
END