--- title: "Parametric models" output: rmarkdown::html_vignette bibliography: references.bib link-citations: TRUE vignette: > %\VignetteIndexEntry{Parametric models} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup, output=FALSE, warning=FALSE, message=FALSE} library(serosv) library(dplyr) library(magrittr) ``` # Frequentist methods ## Polynomial models Seroprevalence is modeled using the following serocatalytic model $$ \pi(a) = 1 - \text{exp}({-\Sigma_{i=1}^k \beta_i a^i}) $$ Where: - $a$ is the variable age - $\pi$ is the seroprevalence of the population at age $a$ - $k$ is the degree of the polynomial - $\beta_i$ are the model parameters Which implies the force of infection is $\lambda(a) = \Sigma_{i=1}^k \beta_i i a^{i-1}$ This generalization encompasses several classical serocatalytic model including - **Muench model** (assuming $k=1$) [@muench_derivation_1934] - **Griffith model** (assuming $k = 2$) - **Grenfell and Anderson model** (assuming higher degree $k$) [@grenfell_estimation_1985] Refer to `Chapter 6.1.1` of the book by @Hens2012 for a more detailed explanation of the methods. **Fitting data** We will use the `Parvo B19` data from Finland 1997--1998 for this example. ```{r} data <- parvob19_fi_1997_1998[order(parvob19_fi_1997_1998$age), ] ``` To fit a polynomial model, use the `polynomial_model()` function. ```{r} # Fit a Muench model muench <- polynomial_model(data, k = 1, status_col="seropositive") summary(muench$info) plot(muench) ``` The users can also choose to provide a range of values for $k$ in which case the package will try to find the best $k$ parameter determined by Loglikelihood Ratio test (LRT) ```{r} # Provide a range of values for k best_param <- polynomial_model(data, k = 1:5, status_col = "seropositive") plot(best_param) # View the best model here which suggests k = 4 is the best parameter value best_param ``` ------------------------------------------------------------------------ ## Fractional polynomial model **Proposed model** Fractional polynomial model generalize conventional polynomial class of functions. In the context of binary responses, a fractional polynomial of degree $m$ for the linear predictor is defined as followed $$ \eta_m(a, \beta, p_1, p_2, ...,p_m) = \Sigma^m_{i=0} \beta_i H_i(a) $$ Where $m$ is an integer, $p_1 \le p_2 \le... \le p_m$ is a sequence of powers, and $H_i(a)$ is a transformation given by $$ H_i = \begin{cases} a^{p_i} & \text{ if } p_i \neq p_{i-1}, \\ H_{i-1}(a) \times log(a) & \text{ if } p_i = p_{i-1}, \end{cases} $$ Refer to `Chapter 6.2` of the book by @Hens2012 for a more detailed explanation of the methods. **Fitting data** Use `fp_model()` to fit a fractional polynomial model. The parameter `p` specifies the powers of each polynomial term (length of `p` is thus the model's degree) ```{r} hav <- hav_be_1993_1994 model <- fp_model(hav, p=c(1, 1.5), link="cloglog") model plot(model) ``` The users can also tell the package to perform parameter selection by providing `p` as a named list with 2 elements: - `degree` the maximum number of terms to search over - `p_range` the possible powers for each term ```{r, warning=FALSE} model <- fp_model(hav, p=list( p_range=seq(-2,3,0.1), degree=2 ), monotonic=FALSE, link="cloglog") plot(model) # the best set of powers for this dataset is 1.5 and 1.6 model ``` To restrict the parameters search such that the predictions are monotonic (thus ensuring the FOI to be $\lambda \geq 0$) set `monotonic=TRUE` ```{r warning=FALSE} # ---- Best model with the monotonic constraint ----- model <- fp_model(hav, p=list( p_range=seq(-2,3,0.1), degree=2 ), monotonic=TRUE, link="cloglog") plot(model) # the best set of powers with the monotonic constraint is 0.5 and 1.1 model ``` ------------------------------------------------------------------------ ## Nonlinear models ### Farrington model **Proposed model** For Farrington's model, the force of infection was defined non-negative for all a $\lambda(a) \geq 0$ and increases to a peak in a linear fashion followed by an exponential decrease $$ \lambda(a) = (\alpha a - \gamma)e^{-\beta a} + \gamma $$ Where $\gamma$ is called the long term residual for FOI, as $a \rightarrow \infty$ , $\lambda (a) \rightarrow \gamma$ Integrating $\lambda(a)$ would results in the following non-linear model for prevalence $$ \pi (a) = 1 - e^{-\int_0^a \lambda(s) ds} \\ = 1 - exp\{ \frac{\alpha}{\beta}ae^{-\beta a} + \frac{1}{\beta}(\frac{\alpha}{\beta} - \gamma)(e^{-\beta a} - 1) -\gamma a \} $$ Refer to `Chapter 6.1.2` of the book by @Hens2012 for a more detailed explanation of the methods. **Fitting data** Use `farrington_model()` to fit a **Farrington**'s model. ```{r, warning=FALSE} farrington_md <- farrington_model( rubella_uk_1986_1987, start=list(alpha=0.07,beta=0.1,gamma=0.03) ) farrington_md plot(farrington_md) ``` ### Weibull model **Proposed model** For a Weibull model, the prevalence is given by $$ \pi (d) = 1 - e^{ - \beta_0 d ^ {\beta_1}} $$ Where $d$ is exposure time (difference between age of injection and age at test) The model was reformulated as a GLM model with log - log link and linear predictor using log(d) $$\eta(d) = log(\beta_0) + \beta_1 log(d)$$ Thus implies that the force of infection is a monotone function of the exposure time as followed $$ \lambda(d) = \beta_0 \beta_1 d^{\beta_1 - 1} $$ Refer to `Chapter 6.1.2` of the book by @Hens2012 for a more detailed explanation of the methods. **Fitting data** Use `weibull_model()` to fit a Weibull model. ```{r} hcv <- hcv_be_2006[order(hcv_be_2006$dur), ] wb_md <- hcv %>% weibull_model(t_lab = "dur", status_col="seropositive") wb_md plot(wb_md) ``` # Bayesian methods **Proposed approach** Consider a model for prevalence that has a parametric form $\pi(a_i, \alpha)$ where $\alpha$ is a parameter vector One can constraint the parameter space of the prior distribution $P(\alpha)$ in order to achieve the desired monotonicity of the posterior distribution $P(\pi_1, \pi_2, ..., \pi_m|y,n)$ Where: - $n = (n_1, n_2, ..., n_m)$ and $n_i$ is the sample size at age $a_i$ - $y = (y_1, y_2, ..., y_m)$ and $y_i$ is the number of infected individual from the $n_i$ sampled subjects ## Farrington **Proposed model** The model for prevalence is as followed $$ \pi (a) = 1 - exp\{ \frac{\alpha_1}{\alpha_2}ae^{-\alpha_2 a} + \frac{1}{\alpha_2}(\frac{\alpha_1}{\alpha_2} - \alpha_3)(e^{-\alpha_2 a} - 1) -\alpha_3 a \} $$ For likelihood model, independent binomial distribution are assumed for the number of infected individuals at age $a_i$ $$ y_i \sim Bin(n_i, \pi_i), \text{ for } i = 1,2,3,...m $$ The constraint on the parameter space can be incorporated by assuming truncated normal distribution for the components of $\alpha$, $\alpha = (\alpha_1, \alpha_2, \alpha_3)$ in $\pi_i = \pi(a_i,\alpha)$ $$ \alpha_j \sim \text{truncated } \mathcal{N}(\mu_j, \tau_j), \text{ } j = 1,2,3 $$ The joint posterior distribution for $\alpha$ can be derived by combining the likelihood and prior as followed $$ P(\alpha|y) \propto \prod^m_{i=1} \text{Bin}(y_i|n_i, \pi(a_i, \alpha)) \prod^3_{i=1}-\frac{1}{\tau_j}\text{exp}(\frac{1}{2\tau^2_j} (\alpha_j - \mu_j)^2) $$ - Where the flat hyperprior distribution is defined as followed: - $\mu_j \sim \mathcal{N}(0, 10000)$ - $\tau^{-2}_j \sim \Gamma(100,100)$ The full conditional distribution of $\alpha_i$ is thus $$ P(\alpha_i|\alpha_j,\alpha_k, k, j \neq i) \propto -\frac{1}{\tau_i}\text{exp}(\frac{1}{2\tau^2_i} (\alpha_i - \mu_i)^2) \prod^m_{i=1} \text{Bin}(y_i|n_i, \pi(a_i, \alpha)) $$ Refer to `Chapter 10.3.1` of the book by @Hens2012 for a more detailed explanation of the method. **Fitting data** To fit Farrington model, use `hierarchical_bayesian_model()` and define `type = "far2"` or `type = "far3"` where - `type = "far2"` refers to Farrington model with 2 parameters ($\alpha_3 = 0$) - `type = "far3"` refers to Farrington model with 3 parameters ($\alpha_3 > 0$) ```{r message=FALSE, output=FALSE, warning=FALSE} df <- mumps_uk_1986_1987 model <- hierarchical_bayesian_model(df, type="far3") ``` ```{r} model plot(model) ``` ## Log-logistic **Proposed approach** The model for seroprevalence is as followed $$ \pi(a) = \frac{\beta a^\alpha}{1 + \beta a^\alpha}, \text{ } \alpha, \beta > 0 $$ The likelihood is specified to be the same as Farrington model ($y_i \sim Bin(n_i, \pi_i)$) with $$ \text{logit}(\pi(a)) = \alpha_2 + \alpha_1\log(a) $$ - Where $\alpha_2 = \text{log}(\beta)$ The prior model of $\alpha_1$ is specified as $\alpha_1 \sim \text{truncated } \mathcal{N}(\mu_1, \tau_1)$ with flat hyperprior as in Farrington model $\beta$ is constrained to be positive by specifying $\alpha_2 \sim \mathcal{N}(\mu_2, \tau_2)$ The full conditional distribution of $\alpha_1$ is thus $$ P(\alpha_1|\alpha_2) \propto -\frac{1}{\tau_1} \text{exp} (\frac{1}{2 \tau_1^2} (\alpha_1 - \mu_1)^2) \prod_{i=1}^m \text{Bin}(y_i|n_i,\pi(a_i, \alpha_1, \alpha_2) ) $$ And $\alpha_2$ can be derived in the same way Refer to `Chapter 10.3.3` of the book by @Hens2012 for a more detailed explanation of the method. **Fitting data** To fit Log-logistic model, use `hierarchical_bayesian_model()` and define `type = "log_logistic"` ```{r message=FALSE, output=FALSE, warning=FALSE} df <- rubella_uk_1986_1987 model <- hierarchical_bayesian_model(df, type="log_logistic") ``` ```{r} model plot(model) ```