--- title: "Nonparametric model" output: rmarkdown::html_vignette bibliography: references.bib vignette: > %\VignetteIndexEntry{Nonparametric model} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup, output=FALSE} library(serosv) library(magrittr) ``` ## Local estimation by polynomial **Proposed model** Within the local polynomial framework, the linear predictor $\eta(a)$ is approximated locally at one particular value $a_0$ for age by a line (local linear, degree $p=1$) or a parabola (local quadratic, degree $p=2$). For a general degree $p$, the linear predictor for a neighbor of $a_0$, labeled $a_i$ is equivalent to the Taylor approximation $$ \eta(a_i) = \eta(a_0) + \eta^{(1)}(a_0)(a_i - a_0) + \frac{\eta^{(2)}(a_0)}{2}(a_i - a_0)^2 + ... + \frac{\eta^{(p)}(a_0)}{p!}(a_i - a_0)^p $$ $\eta(a_i)$ can be estimated by maximizing $$ \Sigma_{i=1}^{N} \ell_i \{Y_i, g^{-1} (\beta_0 + \beta_1(a_i-a_0)+ \beta_2(a_i-a_0)^2 ... + \beta_p(a_i-a_0)^p) \} K_h(a_i - a_0) $$ Where: - $\ell_i$ is the binomial log-likelihood $\ell_i\{Y_i, \pi\} = Y_ilog\{\pi\} + (1-Y_i)log\{1-\pi\}$ - $K_h$ is the kernel function with the specified smoothing parameter $h$, also called "bandwidth" or "window width" The estimator for the $k$-th derivative of $\eta(a_0)$, for $k = 0,1,…,p$ (degree of local polynomial) is thus $$ \hat{\eta}^{(k)}(a_0) = k!\hat{\beta}_k(a_0) $$ The estimator for the prevalence at age $a_0$ is then given by $$ \hat{\pi}(a_0) = g^{-1}\{ \hat{\beta}_0(a_0) \} $$ - Where $g$ is the link function The estimator for the force of infection at age $a_0$ by assuming $p \ge 1$ is as followed $$ \hat{\lambda}(a_0) = \hat{\beta}_1(a_0) \delta \{ \hat{\beta}_0 (a_0) \} $$ - Where $\delta \{ \hat{\beta}_0(a_0) \} = \frac{dg^{-1} \{ \hat{\beta}_0(a_0) \} } {d\hat{\beta}_0(a_0)}$ Refer to `Chapter 7.1` of the book by @Hens2012 for a more detailed explanation of the method. **Fitting data** ```{r} mump <- mumps_uk_1986_1987 ``` To fit a local estimation by polynomials, use `lp_model()` function with model parameters include: - `nn` the number of nearest neighbors to be included in the approximation (i.e., the nearest neighbor component of the smoothing parameter) - `h` the constant component of the smoothing parameter - `deg` the degree of the polynomial - `kern` the type of kernel ```{r} lp <- lp_model(mump, kern="tcub", nn = 0.5, deg=2) lp plot(lp) ``` The users can also provide `h` or `nn` as a numeric vector, in which case the package will use the parameter value that minimize the generalized cross-validation (GCV) criterion. ```{r} # ----- Tune nearest neighbor (nn) parameter lp_model(mump, kern="tcub", nn = seq(0.2, 0.8, 0.1), deg=2) %>% plot() # ----- Tune bandwidth (h) parameter lp_model(mump, kern="tcub", h = seq(5,25), deg=2) %>% plot() ``` Alternatively, use `plot_gcv()` to show GCV curves for the nearest neighbor method (left) and constant bandwidth (right) for manual parameter selection. ```{r, fig.width=7, fig.height=3} plot_gcv( mump$age, mump$pos, mump$tot, nn_seq = seq(0.2, 0.8, by=0.1), h_seq = seq(5, 25, by=1) ) ``` Based on the plot, we can then fit the model using the parameter value that give the lowest GCV (`nn = 0.3` and `h = 14`) ```{r} lp_model(mump, kern="tcub", nn = 0.3, deg=2) %>% plot() lp_model(mump, kern="tcub", h = 14, deg=2) %>% plot() ```