--- title: "Robust test statistics with semTests" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Robust test statistics with semTests} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE ) options(warn = -1) ``` ## Introduction `semTests` computes robust *p*-values for structural equation models fit with [`lavaan`](https://lavaan.ugent.be/). The standard chi-square test of model fit is anti-conservative under non-normality, and the usual corrections (Satorra–Bentler and friends) only partly fix it. The eigenvalue-based tests in this package target the *limiting null law* of the test statistic directly: that law is a weighted sum of independent chi-squares, \(\sum_j \lambda_j \chi^2_1\), whose weights \(\lambda_j\) are the eigenvalues of a matrix built from the model and the asymptotic covariance of the sample moments. Because this holds for any minimum-discrepancy estimator, the tests are not specific to normal-theory maximum likelihood. The penalized eigenvalue block-averaging (`pEBA`) and penalized regression (`pOLS`) methods were introduced by Foldnes, Moss, & Grønneberg (2024) and extended to nested model comparison by Foldnes, Grønneberg, & Moss (2026). ```{r setup} library("semTests") library("lavaan") ``` ## Basic usage: goodness of fit Fit a model with `lavaan`, then call `pvalues()`. Fit with a robust test (e.g. `estimator = "MLM"` or `"MLR"`) so that lavaan exposes the asymptotic moment covariance the correction needs. ```{r basic} model <- "visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9" fit <- cfa(model, HolzingerSwineford1939, estimator = "MLM") pvalues(fit) ``` The result is a named vector that also prints a one-line provenance footer recording the estimator, data type, information matrix, and degrees of freedom actually used. The full option record is available with `attr(x, "semtests")`. ### The test-name grammar Each requested test is a string of the form `(test)_(ug?)_(ml|rls)`: * **test** — the family: `pEBA` (penalized block-averaging, the recommended default; `j` is the number of blocks, typically 2–6), `pOLS` (penalized regression), `pall`/`all`, `eba`, or a traditional statistic `std`, `sb` (Satorra–Bentler), `ss` (scaled-and-shifted), `sf` (scaled *F*). * **ug** — include `UG` to use the unbiased Du–Bentler (2022) gamma instead of the standard biased one. * **ml / rls** — the base chi-square: `ML` (normal-theory discrepancy) or `RLS` (Browne's reweighted least squares). When omitted, a fit-appropriate default is chosen. Several tests can be requested at once: ```{r grammar} pvalues(fit, c("SB_ML", "pEBA4_ML", "pOLS2_ML")) ``` The `UG` gamma and the `RLS` statistic are defined only for the classical case (continuous, complete data, `estimator = "ML"`); off that case they are refused with an informative error rather than silently degrading. ## Nested model comparison To compare two nested models, fit both and pass them to `pvalues_nested()` with the **constrained** model first. Here we test weak invariance of the textual loadings: ```{r nested} constrained <- "visual =~ x1 + x2 + x3 textual =~ a*x4 + a*x5 + a*x6 speed =~ x7 + x8 + x9" m1 <- cfa(model, HolzingerSwineford1939, estimator = "MLM") m0 <- cfa(constrained, HolzingerSwineford1939, estimator = "MLM") pvalues_nested(m0, m1) ``` `pall` is the recommended default for nested comparison (Foldnes, Grønneberg, & Moss, 2026). The reduction `method` defaults to `"2000"` (Satorra 2000); `"2001"` is also available for complete-data fits. ## Other estimators and missing data (experimental) The same machinery applies to other estimators. The non-ML paths below are **experimental** as of this release; the classical ML path is stable. See `?semTests-support` for the authoritative matrix of what is supported. GLS and ULS work directly (ULS needs a robust test for the asymptotic covariance): ```{r estimators} gls <- cfa(model, HolzingerSwineford1939, estimator = "GLS") pvalues(gls, "pEBA4") uls <- cfa(model, HolzingerSwineford1939, estimator = "ULS", test = "satorra.bentler") pvalues(uls, "pEBA4") ``` Missing data is handled through full-information maximum likelihood. Fit with `missing = "fiml"` and pass the fit to `pvalues()` as usual. FIML uses the biased gamma and the standard statistic, so the `UG`/`RLS` suffixes do not apply; the default suffix-free tests are the right choice. ```{r fiml} HS <- HolzingerSwineford1939 set.seed(1) HS$x1[sample(nrow(HS), 60)] <- NA fit_fiml <- cfa(model, HS, missing = "fiml", estimator = "MLR") pvalues(fit_fiml) ``` FIML inference rests on the observed information under MAR (lavaan's default for `missing = "fiml"`); a fit using expected information triggers a warning, since that is valid only under the stronger MCAR assumption. ## References Du, H., & Bentler, P. M. (2022). 40-Year Old Unbiased Distribution Free Estimator Reliably Improves SEM Statistics for Nonnormal Data. *Structural Equation Modeling*, 29(6), 872–887. Foldnes, N., Moss, J., & Grønneberg, S. (2024). Improved goodness of fit procedures for structural equation models. *Structural Equation Modeling: A Multidisciplinary Journal*, 1–13. Foldnes, N., Grønneberg, S., & Moss, J. (2026). Penalized eigenvalue block averaging: Extension to nested model comparison and Monte Carlo evaluations. *Behavior Research Methods*.