--- title: "Testing the CAR assumption" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Testing the CAR assumption} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` All estimation in **seine** rests on the Conditional Average Representativeness (CAR) assumption: that individual outcomes are mean-independent of predictor group membership, conditional on the observed covariates. `ei_test_car()` provides a formal test of this assumption. However, the test has important limitations that users should understand before interpreting its results. # What the test does The CAR assumption implies that the conditional expectation function (CEF) of the aggregate outcome takes a specific partially linear form. `ei_test_car()` tests this implication by comparing a fully nonparametric estimate of the CEF to one constrained to that form, and evaluating the goodness-of-fit difference via a Wald statistic. A significant result indicates that the data are inconsistent with the partially linear structure implied by CAR. By default, the p-value is computed via a permutation test (Kennedy-Cade 1996) on the Wald statistic. For large samples (2000 or more observations), the asymptotic chi-squared distribution, which is faster, is used by default instead. ```{r setup} library(seine) data(elec_1968) spec = ei_spec( elec_1968, predictors = vap_white:vap_other, outcome = pres_dem_hum:pres_abs, total = pres_total, covariates = c(state, pop_city:pop_rural, farm:educ_coll, inc_00_03k:inc_25_99k), preproc = function(x) { x = model.matrix(~ 0 + ., x) # convert factors to dummies bases::b_bart(x, trees = 200) } ) ei_test_car(spec, iter = 200) # use iter = 1000 or more in practice ``` The output is a data frame with one row per outcome variable. The `W` column contains the Wald statistic, `df` its degrees of freedom, and `p.value` the p-value for each outcome. P-values are not adjusted for multiple testing by default; pass them to `p.adjust()` if a correction is desired. # Limitations `ei_test_car()` is a useful diagnostic, but its limitations are substantial and should be kept in mind when interpreting the results. **The test only checks a necessary implication of CAR, not CAR itself.** CAR is a condition on individual-level data, but only aggregate-level data are observed. The test asks whether the aggregate CEF is inconsistent with CAR; a failure to reject does not mean CAR holds, only that the data are not in conflict with one of its implications. There may be many forms of individual-level confounding that leave the aggregate CEF approximately in the partially linear form, and which the test will not detect. **The test requires a rich basis expansion to have power.** If the `preproc` argument to `ei_spec()` does not include a flexible basis expansion of the covariates and predictors, the test will have little power to detect violations of CAR. An interaction between the predictors and covariates that is not captured by the basis will not be flagged. A warning is issued if `preproc` is absent. In general, the richer the basis expansion, the better the test can detect violations, but also the more data are needed for the test statistic to be well-calibrated. **The test may be anti-conservative in small samples.** The Wald statistic is only asymptotically chi-squared, and the permutation approximation of the null distribution may also be imperfect when the dimensionality of the basis expansion is large relative to the sample size. In practice, this means the test may reject too often in small samples. The `undersmooth` argument controls how aggressively the partially linear component is estimated, and increasing it can improve Type I error control at the cost of power. **A significant result does not prevent estimation.** Rejecting the null means the data suggest CAR does not hold exactly. It does not mean that estimation with `ei_est()` is impossible or useless, only that the estimates may be biased. In that case, the sensitivity analysis tools in `vignette("sensitivity")` are important for assessing how much the conclusions depend on the assumption. Conversely, a non-significant result is weak evidence that the assumption holds and does not substitute for careful subject-matter reasoning about what confounders might be present. # References Helwig, N. E. (2022). Robust permutation tests for penalized splines. *Stats*, 5(3), 916-933. Kennedy, P. E., & Cade, B. S. (1996). Randomization tests for multiple regression. *Communications in Statistics-Simulation and Computation*, 25(4), 923-936. McCartan, C., & Kuriwaki, S. (2025+). Identification and semiparametric estimation of conditional means from aggregate data. Working paper [arXiv:2509.20194](https://arxiv.org/abs/2509.20194). This vignette was originally produced by a large language model, and then reviewed and edited by the package authors.