This is a tutorial for applying the AIC-type criterion GORICA, using
the goric function, to a lavaan object. The
GORICA is an information criterion that can be used to evaluate
theory-driven, informative, order-restricted hypotheses. The GORICA
selects the best hypothesis out of a given set. Below, you will find
examples for the use of the goric function in the
restriktor R package when you have a lavaan object.
More information (tutorials and example R scripts) is available from
this github page of
Rebecca,
including a tutorial regarding interpreting the GORIC(A) output (called
‘Guidelines_output_GORIC’).
First, install and call the lavaan library and the
restriktor library (for the goric function).
If needed, it is possible to view the description of the function with
the ? operator or the help command.
# To install restriktor in R: if (!require('restriktor'))
# install.packages('restriktor')
library(restriktor)
# print docs in the help-tab to view arguments and explanations for the
# function ?goric
# To install lavaan in R: if (!require('lavaan')) install.packages('lavaan')
library(lavaan)
In this example, we will look at the Confirmatory Factor Analysis (CFA) lavaan example.
To specify your hypotheses in terms of model parameters, you should
give your own labels to estimates by including them in the
lavaan model syntax.
# specify the model, using own labeling
HS.model <- " visual =~ lambda_v1*x1 + lambda_v2*x2 + lambda_v3*x3
textual =~ lambda_t1*x4 + lambda_t2*x5 + lambda_t3*x6
speed =~ lambda_s1*x7 + lambda_s2*x8 + lambda_s3*x9 "
Next, we need to formulate the hypothesis of interest. Let us say
that the hypothesis is that all standardized factor loadings are at
least (in absolute sense) .6. Note that we compare the factor loadings
to a number, which is often the most meaningful when inspecting
standardized factor loadings; then, use standardized = TRUE
in the goric function. As another note, it is also possible
to compare the size of (absolute values of) various standardized factor
loadings. Since we are interested in one theory-based hypothesis, we
compare it to its complement (reflecting all the other possibilities);
which is done by default in the goric function (i.e., by
default: comparison = "complement").
# Hypotheses
H1_cfa <- "
abs(lambda_v1) > .6; abs(lambda_v2) > .6; abs(lambda_v3) > .6;
abs(lambda_t1) > .6; abs(lambda_t2) > .6; abs(lambda_t3) > .6;
abs(lambda_s1) > .6; abs(lambda_s2) > .6; abs(lambda_s3) > .6;
"
# vs its complement (default)
Then, we fit the confirmatory factor analysis (CFA) model using the
cfa function from lavaan:
# fit the model fixing the variances of all the latent variables in a CFA model
# to unity (std.lv = TRUE; https://www.lavaan.ugent.be/tutorial/syntax2.html)
fit_cfa <- cfa(HS.model, data = HolzingerSwineford1939, std.lv = TRUE)
Now, we can call the goric function. Because of entering
a lavaan object, we use GORICA (i.e.,type = "gorica") by
default.
# Calculate GORICA values and weights for H1_cfa and its complement.
set.seed(100) # Needed for reproducibilty & sensitivity check
results_cfa <- goric(fit_cfa, hypotheses = list(H1_cfa = H1_cfa), standardized = TRUE)
restriktor Message: The covariance matrix of the estimates was obtained via 'vcov()'. This is the biased (restricted) sample covariance matrix, not the unbiased version based on the total sample size ('N').
# summary(results_cfa)
results_cfa
restriktor (0.6-20): generalized order-restricted information criterion approximation:
Results:
model loglik penalty gorica loglik.weights penalty.weights gorica.weights
1 H1_cfa 16.167 4.478 -23.378 0.011 0.989 0.497
2 complement 20.685 8.984 -23.402 0.989 0.011 0.503
Conclusion:
The order-restricted hypothesis 'H1_cfa' has less support (namely, 0.497 / 0.503 < 1 times more support) than its complement. That is, the complement has 1.01 times more support than 'H1_cfa'
Conclusion: The order-restricted hypothesis ‘H1_cfa’ has \(< 1\) times more, so, less, support than its complement. Thus, we did not find support for our hypothesis, that is, there is at least one standardized factor loading less than 0.6 (in absolute sense).
In this example, we will extend the CFA with a multi-group structure. There are two groups: 1. Pasteur and 2. Grant-White. Each will have there own parameter estimates and thus also need there own labeling. In this example, we will evaluate hypothesis regarding the latent means of the two schools. Before we do, we need to establish measurement invariance, such that we can fairly compare the means.
We could use the GORICA to check for measurement invariance. When you
can specify ranges of values for which the parameters, say, loadings,
are the same for each group, then there is an added value of using the
GORICA. Here, we will inspect the case where we set the the parameters,
say, loadings, to be equal across groups. Then, one can use the AIC as
well, where we will also determine the AIC weights for a more in-depth
comparison.
Note that we compare parameters across groups, so they are already on a
comparable scale, since they denote the same relationship in each group.
Therefore, we do not need the standardized values.
# specify the model
HS.model_mgcfa <- "
visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
"
# configural invariance
fit_ci <- cfa(HS.model_mgcfa, data = HolzingerSwineford1939, group = "school")
# weak invariance
fit_wi <- cfa(HS.model_mgcfa, data = HolzingerSwineford1939, group = "school", group.equal = "loadings")
# strong invariance
fit_si <- cfa(HS.model_mgcfa, data = HolzingerSwineford1939, group = "school", group.equal = c("intercepts",
"loadings"))
# strict invariance
fit_strict <- cfa(HS.model_mgcfa, data = HolzingerSwineford1939, group = "school",
group.equal = c("intercepts", "loadings", "residuals"))
# model comparison with AIC
AIC_meas.invar <- lavTestLRT(fit_ci, fit_wi, fit_si, fit_strict)$AIC
hypo.names <- c("configural", "weak", "strong", "strict")
AIC_weights <- calc_ICweights(AIC_meas.invar, hypo.names)
# print(AIC_weights, use_scientific = FALSE, digits = 3)
AIC_weights$ratio_IC_weights
vs_configural vs_weak vs_strong vs_strict
configural 1.000000e+00 1.489865e-01 1.845625e+05 1.373106e+05
weak 6.712019e+00 1.000000e+00 1.238787e+06 9.216313e+05
strong 5.418218e-06 8.072412e-07 1.000000e+00 7.439787e-01
strict 7.282760e-06 1.085033e-06 1.344124e+00 1.000000e+00
AIC_weights$ratio_IC_weights["weak", ]
vs_configural vs_weak vs_strong vs_strict
6.712019e+00 1.000000e+00 1.238787e+06 9.216313e+05
The weak invariance (equal factor loadings) is the best of the set, since it has the lowest value. From the AIC weights, we can see that it is about 20 times more supported than configural invariance, and many times more supported than strong and strict invariance. Therefore, it is unwise to directly compare the values of the latent means across the two groups.
To still show how one can do that using the GORICA, we do proceed here.
From here on, we assume that strong (or even strict) measurement
invariance was established, such that we can fairly compare the latent
means. Let us say that our expectation is the latent means of our three
factors are in absolute sense higher in the Pasteur group than in the
Grant-White group.
First, we will create the model such that we obtain these latent means.
Second, we specify our hypothesis. Third, we run the model. Fourth, we
evaluate our hypothesis using the GORICA.
# specify the model, using own labeling (per group!) Specify model, such that
# we obtain the latent means (and not differences w.r.t. a reference category).
HS.model_mgcfa <- "
visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
# latent variable intercepts; here:
# the latent means
visual ~ c(mean_v_P, mean_v_GW) * 1
textual ~ c(mean_t_P, mean_t_GW) * 1
speed ~ c(mean_s_P, mean_s_GW) * 1
"
# where below we ensure: intercepts equal across groups $ loadings equal across
# groups: group.equal = c('intercepts', 'loadings') and first intercept equal
# to 0: marker.int.zero = TRUE
# Hypotheses
H1_mgcfa <- "
abs(mean_v_P) > abs(mean_v_GW);
abs(mean_t_P) > abs(mean_t_GW);
abs(mean_s_P) > abs(mean_s_GW)
"
# vs its complement (default)
# fit the model
fit_mgcfa <- cfa(HS.model_mgcfa, data = HolzingerSwineford1939, group = "school",
group.equal = c("intercepts", "loadings"), marker.int.zero = TRUE)
# Calculate GORICA values and weights
set.seed(100) # Needed for reproducibilty & sensitivity check
results_mgcfa <- goric(fit_mgcfa, hypotheses = list(H1_mgcfa = H1_mgcfa))
restriktor Message: The covariance matrix of the estimates was obtained via 'vcov()'. This is the biased (restricted) sample covariance matrix, not the unbiased version based on the total sample size ('N').
# Altrenative: In the case you want to extract the estimates and their
# covariance matrix yourself and use that as input for the goric function:
# Extract estimates and their covariance matrix
names_param <- c("mean_v_P", "mean_v_GW", "mean_t_P", "mean_t_GW", "mean_s_P", "mean_s_GW")
est <- coef(fit_mgcfa)[names_param]
VCOV <- vcov(fit_mgcfa)[names_param, names_param]
# Calculate GORICA values and weights
set.seed(100) # Needed for reproducibilty & sensitivity check
results_mgcfa <- goric(est, VCOV = VCOV, hypotheses = list(H1_mgcfa = H1_mgcfa))
# GORICA values and weights summary(results_mgcfa)
results_mgcfa
restriktor (0.6-20): generalized order-restricted information criterion approximation:
Results:
model loglik penalty gorica loglik.weights penalty.weights gorica.weights
1 H1_mgcfa -2.500 4.774 14.549 0.000 0.649 0.000
2 complement 9.594 5.391 -8.407 1.000 0.351 1.000
Conclusion:
The order-restricted hypothesis 'H1_mgcfa' has less support (namely, 0.000 / 1.000 < 1 times more support) than its complement. That is, the complement has Inf times more support than 'H1_mgcfa'
Conclusion: The order-restricted hypothesis ‘H1_mgcfa’ has less support (namely, \(< 1\) times more support) than its complement. That is, there is support for the latent means being in absolute sense higher in the Pasteur group than in the Grant-White group.
In this example, we will look at the structural equation modeling (SEM) lavaan example.
Here, we assume that we are interested in (some of) the regression
parameters; so, we only need to label these. We want to compare the
strengths of the predictive relationships; hence, we need standardized
estimates for a meaningful comparison; that is, add
standardized = TRUE in the goric function. One
can compare strengths of relationships of various independent variables
for a specific dependent variable, but one over different dependent
variables.
Here, we compare the strengths of relationships for one dependent
variable, namely dem65. More specially, we expect that the predictive
relationship of dem60 for dem65 (beta_dem65_dem60) is (in absolute
sense) stronger than that of ind60 for dem65 (beta_dem65_ind60).
# specify the model, using own labeling
model_sem <- "
# measurement model
ind60 =~ x1 + x2 + x3
dem60 =~ y1 + y2 + y3 + y4
dem65 =~ y5 + y6 + y7 + y8
# regressions
dem60 ~ ind60
dem65 ~ beta_dem65_ind60*ind60 + beta_dem65_dem60*dem60
# residual correlations
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8
"
# Hypotheses
H1_sem <- "
abs(beta_dem65_dem60) > abs(beta_dem65_ind60)
"
# vs its complement (default)
# fit the model
fit_sem <- sem(model_sem, data = PoliticalDemocracy)
# Calculate GORICA values and weights
set.seed(100) # Needed for reproducibilty & sensitivity check
results_sem <- goric(fit_sem, hypotheses = list(H1_sem = H1_sem), standardized = TRUE)
restriktor Message: The covariance matrix of the estimates was obtained via 'vcov()'. This is the biased (restricted) sample covariance matrix, not the unbiased version based on the total sample size ('N').
# summary(results_sem)
results_sem
restriktor (0.6-20): generalized order-restricted information criterion approximation:
Results:
model loglik penalty gorica loglik.weights penalty.weights gorica.weights
1 H1_sem 4.348 1.500 -5.697 1.000 0.500 1.000
2 complement -14.069 1.500 31.137 0.000 0.500 0.000
Conclusion:
The order-restricted hypothesis 'H1_sem' has 99620500.72 times more support than its complement.
# Unrestricted & Order-restricted parameter estimates of interest Note:
# Hypotheses should be specified before seeing the data. Looking afterwards at
# the standardized estimates can give you more insight. Unrestricted parameter
# estimates of interest
subset(standardizedSolution(fit_sem), op == "~" & lhs == "dem65")
lhs op rhs label est.std se z pvalue ci.lower ci.upper
13 dem65 ~ ind60 beta_dem65_ind60 0.182 0.070 2.587 0.01 0.044 0.320
14 dem65 ~ dem60 beta_dem65_dem60 0.885 0.051 17.405 0.00 0.786 0.985
# Order-restricted parameter estimates of interest; that is, the estimates of
# interest under the restrictions in the hypotheses:
results_sem$ormle
$b.restr
beta_dem65_ind60 beta_dem65_dem60
H1_sem 0.1822593 0.8852290
complement 0.5959879 0.5959879
Conclusion: The order-restricted hypothesis ‘H1_sem’ has (\(> 1\) times) more support than its complement. Thus, we found support that the predictive relationship of ind60 for dem60 is (in absolute sense) higher that that for dem65.
In this example, we will look at the multilevel/two-level SEM lavaan example.
Here, we assume that we are interested in (some of) the regression
parameters; so, we only need to label these. We want to compare the
strengths of the predictive relationships. Hence, we need standardized
estimates for a meaningful comparison; that is, add
standardized = TRUE in the goric function. We
expect that the order of the (absolute) strength/size in predictive
relationships for fw (beta) is highest for x1, then x2, and then x3,
that is, abs(beta_x1) > abs(beta_x2) > abs(beta_x3).
# specify the model, using own labeling
model_msem <- "
level: 1
fw =~ y1 + y2 + y3
fw ~ beta_x1*x1 + beta_x2*x2 + beta_x3*x3
level: 2
fb =~ y1 + y2 + y3
fb ~ w1 + w2
"
# Hypotheses
H1_msem <- "
abs(beta_x1) > abs(beta_x2) > abs(beta_x3)
"
# vs its complement (default)
# fit the model
fit_msem <- sem(model = model_msem, data = Demo.twolevel, cluster = "cluster")
# Calculate GORICA values and weights
set.seed(100) # Needed for reproducibilty & sensitivity check
results_msem <- goric(fit_msem, hypotheses = list(H1_msem = H1_msem), standardized = TRUE)
restriktor Message: The covariance matrix of the estimates was obtained via 'vcov()'. This is the biased (restricted) sample covariance matrix, not the unbiased version based on the total sample size ('N').
# summary(results_msem)
results_msem
restriktor (0.6-20): generalized order-restricted information criterion approximation:
Results:
model loglik penalty gorica loglik.weights penalty.weights gorica.weights
1 H1_msem 9.146 1.827 -14.637 0.995 0.700 0.998
2 complement 3.848 2.673 -2.350 0.005 0.300 0.002
Conclusion:
The order-restricted hypothesis 'H1_msem' has 465.55 times more support than its complement.
Conclusion: The order-restricted hypothesis ‘H1_msem’ has (\(> 1\) times) more support than its complement. Thus, we found support that the order of the (absolute) strength in predictive relationships for fw (beta) is highest for x1, then x2, and then x3, that is, abs(beta_x1) > abs(beta_x2) > abs(beta_x3).
In this example, we will look at the linear growth model with a time-varying covariate lavaan example.
Here, we assume that we are interested in the predictive strength of the two predictors (x1 and x2) on the slope (s); so, we only need to label these. We expect that the strength of x2 on the slope (s_2) is (in absolute sense) higher than that of x1 (s_1).
# specify the model, using own labeling a linear growth model with a
# time-varying covariate
model_growth <- "
# intercept and slope with fixed coefficients
i =~ 1*t1 + 1*t2 + 1*t3 + 1*t4
s =~ 0*t1 + 1*t2 + 2*t3 + 3*t4
# regressions
i ~ x1 + x2
s ~ s_1*x1 + s_2*x2
# time-varying covariates
t1 ~ c1
t2 ~ c2
t3 ~ c3
t4 ~ c4
## Extra: Label intercepts
#i ~ intercept_i * 1
#s ~ intercept_s * 1
"
# Hypotheses
H1_growth <- "
abs(s_2) > abs(s_1)
"
# vs its complement (default)
# fit the model
fit_growth <- growth(model_growth, data = Demo.growth)
# Calculate GORICA values and weights
set.seed(100) # Needed for reproducibilty & sensitivity check
results_growth <- goric(fit_growth, hypotheses = list(H1_growth = H1_growth), standardized = TRUE)
restriktor Message: The covariance matrix of the estimates was obtained via 'vcov()'. This is the biased (restricted) sample covariance matrix, not the unbiased version based on the total sample size ('N').
# summary(results_growth)
results_growth
restriktor (0.6-20): generalized order-restricted information criterion approximation:
Results:
model loglik penalty gorica loglik.weights penalty.weights gorica.weights
1 H1_growth 5.210 1.500 -7.419 1.000 0.500 1.000
2 complement -8.912 1.500 20.824 0.000 0.500 0.000
Conclusion:
The order-restricted hypothesis 'H1_growth' has 1358030.63 times more support than its complement.
Conclusion: The order-restricted hypothesis ‘H1_growth’ has many more (\(>> 1\) times more) support than its complement. Thus, we found support that the intercept of the slope s is (in absolute sense) higher than the intercept of the intercept i.
In this example, we will look at the mediation lavaan example.
Let us say we want to evaluate whether there is partial mediation or full mediation. It can be helpful to specify what you believe is the minimum effect. Here, we assume that an (in)direct effect between -0.1 and 0.1 can be seen as no effect. More examples can found on this github page of Rebecca.
# Create data
set.seed(1234)
X <- rnorm(100)
M <- 0.5 * X + rnorm(100)
Y <- 0.7 * M + rnorm(100)
Data <- data.frame(X = X, Y = Y, M = M)
# specify the model, using own labeling
model_med <- "
# direct effect
Y ~ c*X
# mediator
M ~ a*X
Y ~ b*M
# indirect effect (a*b)
indirect := a*b
# direct effect (c)
direct := c
"
# Hypotheses
H_part <- "abs(indirect) > 0.1; abs(direct) > 0.1"
H_full <- "abs(indirect) > 0.1; -0.1 < direct < 0.1"
# and unconstrained as failsafe (default)
# fit the model
fit_med <- sem(model_med, data = Data)
# summary(fit_med, standardized = TRUE)
# Calculate GORICA values and weights: 1. based on the object
set.seed(100) # Needed for reproducibilty & sensitivity check
results_med <- goric(fit_med, hypotheses = list(H_part = H_part, H_full = H_full),
standardized = TRUE)
restriktor Message: The covariance matrix of the estimates was obtained via 'vcov()'. This is the biased (restricted) sample covariance matrix, not the unbiased version based on the total sample size ('N').
# summary(results_med)
results_med
restriktor (0.6-20): generalized order-restricted information criterion approximation:
Results:
model loglik penalty gorica loglik.weights penalty.weights gorica.weights gorica.weights_without_unc
1 H_part 8.382 3.922 -8.919 0.249 0.349 0.262 0.303
2 H_full 8.795 3.500 -10.590 0.376 0.532 0.604 0.697
3 unconstrained 8.795 5.000 -7.590 0.376 0.119 0.135
Conclusion:
- The order-restricted hypothesis 'H_full' is the best in the set, as it has the highest GORIC(A) weight.
- Since 'H_full' has a higher GORIC(A) weight than the unconstrained hypothesis, it is not considered weak. We can now inspect the relative support for 'H_full' against the other order-restricted hypotheses:
* 'H_full' is 2.306 times more supported than 'H_part'.
# 2. based on extracted estimates Extract standardized estimates of the defined
# parameters and their covariance matrix
label_names <- c("direct", "indirect")
est <- as.vector(standardizedSolution(fit_med)["est.std"])$est.std
names(est) <- standardizedSolution(fit_med)$label
est <- est[label_names]
# est
VCOV <- lavInspect(fit_med, "vcov.def.std.all")[label_names, label_names] # VCOV matrix of parameters
# GORICA
set.seed(123) # for reproducibility & possibly sensitivity check
results_med <- goric(est, VCOV = VCOV, hypotheses = list(H_part = H_part, H_full = H_full))
# summary(results_med)
results_med
restriktor (0.6-20): generalized order-restricted information criterion approximation:
Results:
model loglik penalty gorica loglik.weights penalty.weights gorica.weights gorica.weights_without_unc
1 H_part 3.098 0.922 -4.351 0.249 0.349 0.262 0.303
2 H_full 3.511 0.500 -6.022 0.376 0.532 0.604 0.697
3 unconstrained 3.511 2.000 -3.022 0.376 0.119 0.135
Conclusion:
- The order-restricted hypothesis 'H_full' is the best in the set, as it has the highest GORIC(A) weight.
- Since 'H_full' has a higher GORIC(A) weight than the unconstrained hypothesis, it is not considered weak. We can now inspect the relative support for 'H_full' against the other order-restricted hypotheses:
* 'H_full' is 2.306 times more supported than 'H_part'.