Get Started

Shu Fai Cheung

2023-10-31

Introduction

The package semlbci (Cheung & Pesigan, 2023) includes functions for finding the likelihood-based confidence intervals (LBCIs) of parameters in the output of a structural equation modeling (SEM) function. Currently, it supports the output from lavaan::lavaan() and its wrappers, such as lavaan::sem() and lavaan::cfa().

The latest stable version can be installed from GitHub:

remotes::install_github("sfcheung/semlbci")

Further information about semlbci can be found in Cheung and Pesigan (2023).

Fit a Model to a Dataset

The package has a dataset, simple_med, with three variables, x, m, and y. Let us fit a simple mediation model to this dataset.

library(semlbci)
data(simple_med)
dat <- simple_med
head(dat)
#>            x          m         y
#> 1 -0.3447375   7.284273 -5.636897
#> 2 -0.3658919  -5.449121 -4.525402
#> 3 -0.8294968  -7.016254 -7.823257
#> 4 -0.3389654   4.367018  1.563098
#> 5 -0.9628162  -4.015469 -7.288511
#> 6 -1.0749302 -11.538140 -4.153572
library(lavaan)
#> This is lavaan 0.6-16
#> lavaan is FREE software! Please report any bugs.
mod <-
"
m ~ a*x
y ~ b*m
ab := a * b
"
# We set fixed.x = FALSE because we will also find the LBCIs for
# standardized solution
fit <- sem(mod, simple_med, fixed.x = FALSE)

To illustrate how to find the LBCIs for user-defined parameters, we labelled the m ~ x path by a, the y ~ m path by b, and defined the indirect effect, ab, by a * b.

This is the summary:

summary(fit, standardized = TRUE)
#> lavaan 0.6.16 ended normally after 1 iteration
#> 
#>   Estimator                                         ML
#>   Optimization method                           NLMINB
#>   Number of model parameters                         5
#> 
#>   Number of observations                           200
#> 
#> Model Test User Model:
#>                                                       
#>   Test statistic                                10.549
#>   Degrees of freedom                                 1
#>   P-value (Chi-square)                           0.001
#> 
#> Parameter Estimates:
#> 
#>   Standard errors                             Standard
#>   Information                                 Expected
#>   Information saturated (h1) model          Structured
#> 
#> Regressions:
#>                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
#>   m ~                                                                   
#>     x          (a)    1.676    0.431    3.891    0.000    1.676    0.265
#>   y ~                                                                   
#>     m          (b)    0.535    0.073    7.300    0.000    0.535    0.459
#> 
#> Variances:
#>                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
#>    .m                34.710    3.471   10.000    0.000   34.710    0.930
#>    .y                40.119    4.012   10.000    0.000   40.119    0.790
#>     x                 0.935    0.094   10.000    0.000    0.935    1.000
#> 
#> Defined Parameters:
#>                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
#>     ab                0.897    0.261    3.434    0.001    0.897    0.122

Examples

The main function to find the LBCIs for free parameters is semlbci(). This should be the only function used by normal users. We will first illustrate its usage by some examples, and then present other technical details in the following section.

Find the LBCI for a free parameter

All free parameters can be specified in lavaan style. For example, the path from m to y is denoted by "y ~ m", and the covariance or correlation between x and m (not in the example) is denoted by "x ~~ m" (order does not matter).

out <- semlbci(sem_out = fit,
               pars = c("y ~ m",
                        "m ~ x"))

The output is the parameter table of the fitted lavaan object, with two columns added, lbci_lb and lbci_ub, the likelihood-based lower bounds and upper bounds, respectively.

out
#> 
#> Results:
#>   id lhs op rhs label   est lbci_lb lbci_ub    lb    ub cl_lb cl_ub
#> 1  1   m  ~   x     a 1.676   0.828   2.525 0.832 2.520 0.950 0.950
#> 2  2   y  ~   m     b 0.535   0.391   0.679 0.391 0.679 0.950 0.950
#> 
#> Annotation:
#> * lbci_lb, lbci_ub: The lower and upper likelihood-based bounds.
#> * est: The point estimates from the original lavaan output.
#> * lb, ub: The original lower and upper bounds, extracted from the
#>     original lavaan output. Usually Wald CIs for free parameters and
#>     delta method CIs for user-defined parameters
#> * cl_lb, cl_ub: One minus the p-values of chi-square difference tests
#>     at the bounds. Should be close to the requested level of
#>     confidence, e.g., .95 for 95% confidence intervals.
#> 
#> Call:
#> semlbci(sem_out = fit, pars = c("y ~ m", "m ~ x"))

In this example, the point estimate of the unstandardized coefficient from x to m is 1.676, and the LBCI is 0.828 to 2.525.

Customizing the printout

For users familiar with the column names, the annotation can be disabled by calling print() and add annotation = FALSE:

print(out, annotation = FALSE)
#> 
#> Results:
#>   id lhs op rhs label   est lbci_lb lbci_ub    lb    ub cl_lb cl_ub
#> 1  1   m  ~   x     a 1.676   0.828   2.525 0.832 2.520 0.950 0.950
#> 2  2   y  ~   m     b 0.535   0.391   0.679 0.391 0.679 0.950 0.950

The results can also be printed in a lavaan-like format by calling print(), setting sem_out to the original fit object (fit in this example), and add output = "lavaan":

print(out,
      sem_out = fit,
      output = "lavaan")
#> Likelihood-Based CI Notes:
#> 
#> - lb.lower, lb.upper: The lower and upper likelihood-based confidence
#>                       bounds.
#> 
#> Parameter Estimates:
#> 
#>   Standard errors                             Standard
#>   Information                                 Expected
#>   Information saturated (h1) model          Structured
#> 
#> Regressions:
#>                    Estimate  Std.Err  z-value  P(>|z|) lb.lower lb.upper
#>   m ~                                                                   
#>     x          (a)    1.676    0.431    3.891    0.000    0.828    2.525
#>   y ~                                                                   
#>     m          (b)    0.535    0.073    7.300    0.000    0.391    0.679

By default, the original confidence intervals will not be printed. See the help page of print.semlbci() for other options available.

Find the LBCI for a User-Defined Parameter

To find the LBCI for a user-defined parameter, use label :=, where label is the label used in the model specification. The definition of this parameter can be omitted. The content after := will be ignored by semlbci().

out <- semlbci(sem_out = fit,
               pars = c("ab := "))
print(out,
      sem_out = fit,
      output = "lavaan")
#> Likelihood-Based CI Notes:
#> 
#> - lb.lower, lb.upper: The lower and upper likelihood-based confidence
#>                       bounds.
#> 
#> Parameter Estimates:
#> 
#>   Standard errors                             Standard
#>   Information                                 Expected
#>   Information saturated (h1) model          Structured
#> 
#> Defined Parameters:
#>                    Estimate  Std.Err  z-value  P(>|z|) lb.lower lb.upper
#>     ab                0.897    0.261    3.434    0.001    0.427    1.464

In this example, the point estimate of the indirect effect is 0.897, and the LBCI is 0.427 to 1.464.

(Note: In some examples, we added annotation = FALSE to suppress the annotation in the printout to minimize the length of this vignette.)

Find the LBCI for the Parameters in the Standardized Metric

By the default, the unstandardized solution is used by semlbci(). If the LBCIs for the standardized solution solution are needed, set standardized = TRUE.

out <- semlbci(sem_out = fit,
               pars = c("y ~ m",
                        "m ~ x"),
               standardized = TRUE)
print(out,
      sem_out = fit,
      output = "lavaan")
#> Likelihood-Based CI Notes:
#> 
#> - lb.lower, lb.upper: The lower and upper likelihood-based confidence
#>                       bounds.
#> 
#> Standardized Estimates Only
#> 
#>   Standard errors                             Standard
#>   Information                                 Expected
#>   Information saturated (h1) model          Structured
#> 
#> Regressions:
#>                Standardized  Std.Err  z-value  P(>|z|) lb.lower lb.upper
#>   m ~                                                                   
#>     x          (a)    0.265    0.066    4.035    0.000    0.133    0.389
#>   y ~                                                                   
#>     m          (b)    0.459    0.056    8.215    0.000    0.342    0.561

If LBCIs are for the standardized solution and output set to "lavaan" when printing the results, the parameter estimates, standard errors, z-values, and p-values are those from the standardized solution.

The LBCIs for standardized user-defined parameters can be requested similarly.

out <- semlbci(sem_out = fit,
               pars = c("ab :="),
               standardized = TRUE)
print(out,
      sem_out = fit,
      output = "lavaan")
#> Likelihood-Based CI Notes:
#> 
#> - lb.lower, lb.upper: The lower and upper likelihood-based confidence
#>                       bounds.
#> 
#> Standardized Estimates Only
#> 
#>   Standard errors                             Standard
#>   Information                                 Expected
#>   Information saturated (h1) model          Structured
#> 
#> Defined Parameters:
#>                Standardized  Std.Err  z-value  P(>|z|) lb.lower lb.upper
#>     ab                0.122    0.034    3.538    0.000    0.059    0.194

Basic Arguments in semlbci()

sem_out and pars: The fit object and the parameters

The only required argument for semlbci() is sem_out, the fit object from lavaan::lavaan() or its wrappers (e.g., lavann::cfa() and lavaan::sem()). By default, semlbci() will find the LBCIs for all free parameters (except for variances and error variances) and user-defined parameters, which can take a long time for a model with many parameters. Moreover, LBCI is usually used when Wald-type confidence interval may not be suitable, for example, forming the confidence interval for an indirect effect or a parameter in the standardized solution. These parameters may have sampling distributions that are asymmetric or otherwise substantially nonnormal due to bounded parameter spaces or other reasons.

Therefore, it is recommended to call semlbci() without specifying any parameters. If the time to run is long, then call semlbci() only for selected parameters. The argument pars should be a model syntax or a vector of strings which specifies the parameters for which LBCIs will be formed (detailed below).

If time is not a concern, for example, when users want to explore the LBCIs for all free and user-defined parameters in a final model, then pars can be omitted to request the LBCIs for all free parameters (except for variances and covariances) and user-defined parameters (if any) in a model.

ciperc: The level of confidence

By default, the 95% LBCIs for the unstandardized solution will be formed. To change the level of confidence, set the argument ciperc to the desired coverage probability, e.g., .95 for 95%, .90 for 90%.

standardized: Whether standardized solution is used

By default, the LBCIs for the unstandardized solution will be formed. If the LBCIs for the standardized solution are desired, set standardized = TRUE. Note that for some models it can be much slower to find the LBCIs for the standardized solution than for the unstandardized solution.

parallel and ncpus

The search for the bounds needs to be done separately for each bound and this can take a long time for a model with many parameters and/or with equality constraints. Therefore, parallel processing should always be enabled by setting parallel to TRUE and ncpus to a number smaller than the number of available cores. For example, without parallel processing, the following search took about 28 seconds on Intel i7-8700:

data(HolzingerSwineford1939)
mod_test <-
'
visual  =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed   =~ x7 + x8 + x9
'
fit_cfa <- cfa(model = mod_test,
               data = HolzingerSwineford1939)
semlbci(fit_cfa)

With parallel processing enabled and using 6 cores, it took about 20 seconds.

semlbci(fit_cfa,
        parallel = TRUE,
        ncpus = 6)

The speed difference can be much greater for a model with many parameters and some equality constraints.

Enabling parallel processing also has the added benefit of showing the progress in real time.

Other arguments

For detailed documentation of other arguments, please refer to the help page of semlbci(). Advanced users who want to tweak the optimization options can check the help pages of ci_bound_wn_i() and ci_i_one(),

Additional Features

Multiple-group Models

semlbci() supports multiple-group models. For example, this is a two-group confirmatory factor analysis model with equality constraints:

data(HolzingerSwineford1939)
mod_cfa <-
'
visual  =~ x1 + v(lambda2, lambda2)*x2 + v(lambda3, lambda3)*x3
textual =~ x4 + v(lambda5, lambda5)*x5 + v(lambda6, lambda6)*x6
speed   =~ x7 + v(lambda8, lambda8)*x8 + v(lambda9, lambda9)*x9
'
fit_cfa <- cfa(model = mod_cfa,
               data = HolzingerSwineford1939,
               group = "school")

The factor correlations between group are not constrained to be equal.

parameterEstimates(fit_cfa)[c(22, 23, 58, 59), ]
#>       lhs op     rhs block group label   est    se     z pvalue ci.lower
#> 22 visual ~~ textual     1     1       0.416 0.097 4.271  0.000    0.225
#> 23 visual ~~   speed     1     1       0.169 0.064 2.643  0.008    0.044
#> 58 visual ~~ textual     2     2       0.437 0.099 4.423  0.000    0.243
#> 59 visual ~~   speed     2     2       0.314 0.079 3.958  0.000    0.158
#>    ci.upper
#> 22    0.606
#> 23    0.294
#> 58    0.631
#> 59    0.469

This is the LBCI for covariance between visual ability and textual ability:

fcov <- semlbci(fit_cfa,
                pars = c("visual ~~ textual"))
print(fcov,
      sem_out = fit_cfa,
      output = "lavaan")
#> Likelihood-Based CI Notes:
#> 
#> - lb.lower, lb.upper: The lower and upper likelihood-based confidence
#>                       bounds.
#> 
#> Parameter Estimates:
#> 
#>   Standard errors                             Standard
#>   Information                                 Expected
#>   Information saturated (h1) model          Structured
#> 
#> 
#> Group 1 [Pasteur]:
#> 
#> Covariances:
#>                    Estimate  Std.Err  z-value  P(>|z|) lb.lower lb.upper
#>   visual ~~                                                             
#>     textual           0.416    0.097    4.271    0.000    0.221    0.654
#> 
#> 
#> Group 2 [Grant-White]:
#> 
#> Covariances:
#>                    Estimate  Std.Err  z-value  P(>|z|) lb.lower lb.upper
#>   visual ~~                                                             
#>     textual           0.437    0.099    4.423    0.000    0.263    0.663

This is the LBCI for correlation between visual ability and textual ability:

fcor <- semlbci(fit_cfa,
                pars = c("visual ~~ textual"),
                standardized = TRUE)
print(fcor,
      sem_out = fit_cfa,
      output = "lavaan")
#> Likelihood-Based CI Notes:
#> 
#> - lb.lower, lb.upper: The lower and upper likelihood-based confidence
#>                       bounds.
#> 
#> Standardized Estimates Only
#> 
#>   Standard errors                             Standard
#>   Information                                 Expected
#>   Information saturated (h1) model          Structured
#> 
#> 
#> Group 1 [Pasteur]:
#> 
#> Covariances:
#>                Standardized  Std.Err  z-value  P(>|z|) lb.lower lb.upper
#>   visual ~~                                                             
#>     textual           0.485    0.087    5.601    0.000    0.291    0.640
#> 
#> 
#> Group 2 [Grant-White]:
#> 
#> Covariances:
#>                Standardized  Std.Err  z-value  P(>|z|) lb.lower lb.upper
#>   visual ~~                                                             
#>     textual           0.540    0.086    6.317    0.000    0.357    0.692

Note that the example above can take more than one minute to one if parallel processing is not enabled.

Robust LBCI

semlbci() also supports the robust LBCI proposed by Falk (2018). To form robust LBCI, the model must be fitted with robust test statistics requested (e.g., estimator = "MLR"). To request robust LBCIs, add robust = "satorra.2000" when calling semlbci().

We use the simple mediation model as an example:

fit_robust <- sem(mod, simple_med,
                  fixed.x = FALSE,
                  estimator = "MLR")
fit_lbci_ab_robust <- semlbci(fit_robust,
                              pars = "ab := ",
                              robust = "satorra.2000")
print(fit_lbci_ab_robust,
      sem_out = fit_robust,
      output = "lavaan")
#> Likelihood-Based CI Notes:
#> 
#> - lb.lower, lb.upper: The lower and upper likelihood-based confidence
#>                       bounds.
#> 
#> Parameter Estimates:
#> 
#>   Standard errors                             Sandwich
#>   Information bread                           Observed
#>   Observed information based on                Hessian
#> 
#> Defined Parameters:
#>                    Estimate  Std.Err  z-value  P(>|z|) lb.lower lb.upper
#>     ab                0.897    0.305    2.941    0.003    0.353    1.571

Latent level parameters

semlbci() support forming the LBCIs for most free parameters. Not illustrated above but LBCIs can be formed for path coefficients between latent variables and also user-defined parameters based on latent-level parameters, such as an indirect effect from one latent variable to another.

More examples

More examples can be found in the “examples” folders in the OSF page for this package.

Limitations

The following is a summary of the limitations of semlbci(). Please refer to check_sem_out() for the full list of limitations. This function is called by semlbci() to check the sem_out object, and will raise warnings or errors as appropriate.

Estimators

The function semlbci() currently supports lavaan::lavaan() results estimated by maximum likelihood (ML), full information maximum likelihood for missing data (fiml), and their robust variants (e.g., MLM).

Models

This package currently supports single and multiple group models with continuous variables. It may work for a model with ordered variables but this is not officially tested.

Methods

The current and preferred method is the one proposed by Wu and Neale (2012), illustrated by Pek and Wu (2015). The current implementation in semlbci() does not check whether a parameter is near its boundary. The more advanced methods by Pritikin, Rappaport, and Neale (2017) will be considered in future development.

Technical Details

A detailed presentation of the internal workflow of semlbci() can be found in the vignette("technical_workflow", package = "semlbci"). Users interested in calling the lowest level function, ci_bound_wn_i(), can see some illustrative examples in vignette("technical_searching_one_bound", package = "semlbci").

References

Cheung, S. F., & Pesigan, I. J. A. (2023). semlbci: An R package for forming likelihood-based confidence intervals for parameter estimates, correlations, indirect effects, and other derived parameters. Structural Equation Modeling: A Multidisciplinary Journal. Advance online publication. https://doi.org/10.1080/10705511.2023.2183860

Falk, C. F. (2018). Are robust standard errors the best approach for interval estimation with nonnormal data in structural equation modeling? Structural Equation Modeling: A Multidisciplinary Journal, 25(2), 244-266. https://doi.org/10.1080/10705511.2017.1367254

Pek, J., & Wu, H. (2015). Profile likelihood-based confidence intervals and regions for structural equation models. Psychometrika, 80(4), 1123–1145. https://doi.org/10.1007/s11336-015-9461-1

Pritikin, J. N., Rappaport, L. M., & Neale, M. C. (2017). Likelihood-based confidence intervals for a parameter with an upper or lower bound. Structural Equation Modeling: A Multidisciplinary Journal, 24(3), 395-401. https://doi.org/10.1080/10705511.2016.1275969

Wu, H., & Neale, M. C. (2012). Adjusted confidence intervals for a bounded parameter. Behavior Genetics, 42(6), 886–898. https://doi.org/10.1007/s10519-012-9560-z