---
title: "Parametric models"
output: rmarkdown::html_vignette
bibliography: references.bib
link-citations: TRUE
vignette: >
  %\VignetteIndexEntry{Parametric models}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup, output=FALSE, warning=FALSE, message=FALSE}
library(serosv)
library(dplyr)
library(magrittr)
```

# Frequentist methods

## Polynomial models

Seroprevalence is modeled using the following serocatalytic model

$$
\pi(a)  = 1 - \text{exp}({-\Sigma_{i=1}^k \beta_i a^i})
$$

Where:

-   $a$ is the variable age

-   $\pi$ is the seroprevalence of the population at age $a$

-   $k$ is the degree of the polynomial

-   $\beta_i$ are the model parameters

Which implies the force of infection is $\lambda(a) = \Sigma_{i=1}^k \beta_i i a^{i-1}$

This generalization encompasses several classical serocatalytic model including

-   **Muench model** (assuming $k=1$) [@muench_derivation_1934]

-   **Griffith model** (assuming $k = 2$)

-   **Grenfell and Anderson model** (assuming higher degree $k$) [@grenfell_estimation_1985]

Refer to `Chapter 6.1.1` of the book by @Hens2012 for a more detailed explanation of the methods.

**Fitting data**

We will use the `Parvo B19` data from Finland 1997--1998 for this example.

```{r}
data <- parvob19_fi_1997_1998[order(parvob19_fi_1997_1998$age), ]
```

To fit a polynomial model, use the `polynomial_model()` function.

```{r}
# Fit a Muench model
muench <- polynomial_model(data, k = 1, status_col="seropositive")
summary(muench$info)
plot(muench) 
```

The users can also choose to provide a range of values for $k$ in which case the package will try to find the best $k$ parameter determined by Loglikelihood Ratio test (LRT)

```{r}
# Provide a range of values for k
best_param <- polynomial_model(data, k = 1:5, status_col = "seropositive")
plot(best_param)
# View the best model here which suggests k = 4 is the best parameter value
best_param
```

------------------------------------------------------------------------

## Fractional polynomial model

**Proposed model**

Fractional polynomial model generalize conventional polynomial class of functions. In the context of binary responses, a fractional polynomial of degree $m$ for the linear predictor is defined as followed

$$
\eta_m(a, \beta, p_1, p_2, ...,p_m) = \Sigma^m_{i=0} \beta_i H_i(a)
$$

Where $m$ is an integer, $p_1 \le p_2 \le... \le p_m$ is a sequence of powers, and $H_i(a)$ is a transformation given by

$$
H_i = \begin{cases}
 a^{p_i} & \text{ if } p_i \neq p_{i-1},
 \\ H_{i-1}(a) \times log(a)  & \text{ if } p_i = p_{i-1},
\end{cases}
$$

Refer to `Chapter 6.2` of the book by @Hens2012 for a more detailed explanation of the methods.

**Fitting data**

Use `fp_model()` to fit a fractional polynomial model.

The parameter `p` specifies the powers of each polynomial term (length of `p` is thus the model's degree)

```{r}
hav <- hav_be_1993_1994
model <- fp_model(hav, p=c(1, 1.5), link="cloglog")
model
plot(model)
```

The users can also tell the package to perform parameter selection by providing `p` as a named list with 2 elements:

-   `degree` the maximum number of terms to search over
-   `p_range` the possible powers for each term

```{r, warning=FALSE}
model <- fp_model(hav, 
                  p=list(
                    p_range=seq(-2,3,0.1),
                    degree=2
                  ), 
                  monotonic=FALSE,
                  link="cloglog")
plot(model)
# the best set of powers for this dataset is 1.5 and 1.6
model
```

To restrict the parameters search such that the predictions are monotonic (thus ensuring the FOI to be $\lambda \geq 0$) set `monotonic=TRUE`

```{r warning=FALSE}
# ---- Best model with the monotonic constraint -----
model <- fp_model(hav, 
                  p=list(
                    p_range=seq(-2,3,0.1),
                    degree=2
                  ), 
                  monotonic=TRUE,
                  link="cloglog")
plot(model)
# the best set of powers with the monotonic constraint is 0.5 and 1.1
model
```

------------------------------------------------------------------------

## Nonlinear models

### Farrington model

**Proposed model**

For Farrington's model, the force of infection was defined non-negative for all a $\lambda(a) \geq 0$ and increases to a peak in a linear fashion followed by an exponential decrease

$$
\lambda(a) = (\alpha a - \gamma)e^{-\beta a} + \gamma
$$

Where $\gamma$ is called the long term residual for FOI, as $a \rightarrow \infty$ , $\lambda (a) \rightarrow \gamma$

Integrating $\lambda(a)$ would results in the following non-linear model for prevalence

$$
\pi (a) = 1 - e^{-\int_0^a \lambda(s) ds} \\ = 1 - exp\{ \frac{\alpha}{\beta}ae^{-\beta a} + \frac{1}{\beta}(\frac{\alpha}{\beta} - \gamma)(e^{-\beta a} - 1) -\gamma a \}
$$

Refer to `Chapter 6.1.2` of the book by @Hens2012 for a more detailed explanation of the methods.

**Fitting data**

Use `farrington_model()` to fit a **Farrington**'s model.

```{r, warning=FALSE}
farrington_md <- farrington_model(
   rubella_uk_1986_1987,
   start=list(alpha=0.07,beta=0.1,gamma=0.03)
   )
farrington_md
plot(farrington_md)
```

### Weibull model

**Proposed model**

For a Weibull model, the prevalence is given by

$$
\pi (d) = 1 - e^{ - \beta_0 d ^ {\beta_1}} 
$$

Where $d$ is exposure time (difference between age of injection and age at test)

The model was reformulated as a GLM model with log - log link and linear predictor using log(d)

$$\eta(d) = log(\beta_0) + \beta_1 log(d)$$

Thus implies that the force of infection is a monotone function of the exposure time as followed

$$
\lambda(d) = \beta_0 \beta_1 d^{\beta_1 - 1}
$$

Refer to `Chapter 6.1.2` of the book by @Hens2012 for a more detailed explanation of the methods.

**Fitting data**

Use `weibull_model()` to fit a Weibull model.

```{r}
hcv <- hcv_be_2006[order(hcv_be_2006$dur), ]

wb_md <- hcv %>% weibull_model(t_lab = "dur", status_col="seropositive")
wb_md
plot(wb_md) 
```

# Bayesian methods

**Proposed approach**

Consider a model for prevalence that has a parametric form $\pi(a_i, \alpha)$ where $\alpha$ is a parameter vector

One can constraint the parameter space of the prior distribution $P(\alpha)$ in order to achieve the desired monotonicity of the posterior distribution $P(\pi_1, \pi_2, ..., \pi_m|y,n)$

Where:

-   $n = (n_1, n_2, ..., n_m)$ and $n_i$ is the sample size at age $a_i$

-   $y = (y_1, y_2, ..., y_m)$ and $y_i$ is the number of infected individual from the $n_i$ sampled subjects

## Farrington

**Proposed model**

The model for prevalence is as followed

$$
\pi (a) = 1 - exp\{ \frac{\alpha_1}{\alpha_2}ae^{-\alpha_2 a} + \frac{1}{\alpha_2}(\frac{\alpha_1}{\alpha_2} - \alpha_3)(e^{-\alpha_2 a} - 1) -\alpha_3 a \}
$$

For likelihood model, independent binomial distribution are assumed for the number of infected individuals at age $a_i$

$$
y_i \sim Bin(n_i, \pi_i), \text{  for } i = 1,2,3,...m
$$

The constraint on the parameter space can be incorporated by assuming truncated normal distribution for the components of $\alpha$, $\alpha = (\alpha_1, \alpha_2, \alpha_3)$ in $\pi_i = \pi(a_i,\alpha)$

$$
\alpha_j \sim \text{truncated  } \mathcal{N}(\mu_j, \tau_j), \text{ } j = 1,2,3
$$

The joint posterior distribution for $\alpha$ can be derived by combining the likelihood and prior as followed

$$
P(\alpha|y) \propto \prod^m_{i=1} \text{Bin}(y_i|n_i, \pi(a_i, \alpha)) \prod^3_{i=1}-\frac{1}{\tau_j}\text{exp}(\frac{1}{2\tau^2_j} (\alpha_j - \mu_j)^2)
$$

-   Where the flat hyperprior distribution is defined as followed:

    -   $\mu_j \sim \mathcal{N}(0, 10000)$

    -   $\tau^{-2}_j \sim \Gamma(100,100)$

The full conditional distribution of $\alpha_i$ is thus $$
P(\alpha_i|\alpha_j,\alpha_k, k, j \neq i) \propto  -\frac{1}{\tau_i}\text{exp}(\frac{1}{2\tau^2_i} (\alpha_i - \mu_i)^2) \prod^m_{i=1} \text{Bin}(y_i|n_i, \pi(a_i, \alpha))
$$

Refer to `Chapter 10.3.1` of the book by @Hens2012 for a more detailed explanation of the method.

**Fitting data**

To fit Farrington model, use `hierarchical_bayesian_model()` and define `type = "far2"` or `type = "far3"` where

-   `type = "far2"` refers to Farrington model with 2 parameters ($\alpha_3 = 0$)

-   `type = "far3"` refers to Farrington model with 3 parameters ($\alpha_3 > 0$)

```{r message=FALSE, output=FALSE, warning=FALSE}
df <- mumps_uk_1986_1987
model <- hierarchical_bayesian_model(df, type="far3")
```

```{r}
model
plot(model)
```

## Log-logistic

**Proposed approach**

The model for seroprevalence is as followed

$$
\pi(a) = \frac{\beta a^\alpha}{1 + \beta a^\alpha}, \text{ } \alpha, \beta > 0
$$

The likelihood is specified to be the same as Farrington model ($y_i \sim Bin(n_i, \pi_i)$) with

$$
\text{logit}(\pi(a)) = \alpha_2 + \alpha_1\log(a)
$$

-   Where $\alpha_2 = \text{log}(\beta)$

The prior model of $\alpha_1$ is specified as $\alpha_1 \sim \text{truncated  } \mathcal{N}(\mu_1, \tau_1)$ with flat hyperprior as in Farrington model

$\beta$ is constrained to be positive by specifying $\alpha_2 \sim \mathcal{N}(\mu_2, \tau_2)$

The full conditional distribution of $\alpha_1$ is thus

$$
P(\alpha_1|\alpha_2) \propto -\frac{1}{\tau_1} \text{exp} (\frac{1}{2 \tau_1^2} (\alpha_1 - \mu_1)^2)
\prod_{i=1}^m \text{Bin}(y_i|n_i,\pi(a_i, \alpha_1, \alpha_2) )
$$

And $\alpha_2$ can be derived in the same way

Refer to `Chapter 10.3.3` of the book by @Hens2012 for a more detailed explanation of the method.

**Fitting data**

To fit Log-logistic model, use `hierarchical_bayesian_model()` and define `type = "log_logistic"`

```{r message=FALSE, output=FALSE, warning=FALSE}
df <- rubella_uk_1986_1987
model <- hierarchical_bayesian_model(df, type="log_logistic")
```

```{r}
model
plot(model)
```