--- title: "Introduction to smriti: Structural Variance Preservation" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to smriti} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## The Imputation Uncertainty Principle Modern machine learning imputation algorithms (like `missForest`) excel at minimizing point-wise prediction error (RMSE). However, this point-wise optimization inherently shrinks the variance of the imputed values, causing **structural variance collapse**. In longitudinal Growth Curve Models (GCM), this crushes the latent slope variance ($\sigma^2_S$), destroying the statistical power needed to track patient trajectories over time. The `smriti` package resolves this by decoupling prediction from structural geometry. It utilizes a two-stage architecture: 1. **Initialization:** Non-parametric imputation bridges the missingness to establish a dense matrix. 2. **Lagrangian Projection:** A C++ gradient descent layer projects the hallucinated data toward a target covariance manifold while preserving fidelity to the initial imputed values. The augmented loss function is $$L(X) = \frac{1}{2}\|X - X_{\text{imp}}\|_F^2 + \frac{\lambda}{2}\|\operatorname{cov}(X) - \Sigma_{\text{target}}\|_F^2$$ where the first term anchors the solution near the initial imputation and the second (governed by $\lambda$) enforces the covariance structure. ## The Robustness-Efficiency Tradeoff Real-world clinical data often contains heavy-tailed skew or corrupted sensor artifacts. The `smriti_impute()` function handles this via the `robust` routing toggle: * `robust = FALSE`: Uses pairwise-complete Pearson covariance, projected to the nearest positive-semidefinite matrix to correct any non-PSD artefacts from pairwise deletion. Best for well-behaved, approximately-Normal data. * `robust = TRUE`: Constructs the target from pairwise Spearman correlations (rank-based, outlier-resistant) and column-wise MAD scale estimates. The resulting matrix is projected to the nearest PSD manifold, producing a target that is structurally robust to severe outliers (e.g., broken EHR sensors). ## Fidelity-Constraint Balance The penalty weight `lambda` controls the trade-off between preserving the original imputation values and matching the target covariance. At `lambda = 1.0` (the default) both objectives are weighted equally. Increasing `lambda` enforces the covariance constraint more strictly but allows greater deviation from the initial imputation. The `learning_rate` (default `0.001`) governs gradient step size; `max_iter` (default `2000`) bounds the optimisation. ## Example: Shielding Against Corrupted EHR Data ```{r, eval=FALSE} library(smriti) library(missForest) # Load clinical data with structural missingness and sensor artifacts data <- read.csv("clinical_proxy.csv") # Execute robust refinement to isolate the structural manifold clean_data <- smriti_impute( data = data, time_cols = c("T1", "T2", "T3", "T4"), robust = TRUE, lambda = 1.0 ) ```