---
title: "VA-Calibration"
author: "Sandipan Pramanik, Emily Wilson, Jacob Fiksel, Brian Gilbert, Abhirup Datta"
date: "`r format(Sys.Date())`"
output:
  rmarkdown::html_document:
    toc: true
    toc_float: true
    toc_depth: 2
    theme: cosmo
    css: custom.css
vignette: >
  %\VignetteIndexEntry{Intro to vacalibration}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

<a id="top"></a>

```{r, results='asis', echo=FALSE}
cat('
<style>
  pre {
    position: relative;
  }
  .copy-btn {
    position: absolute;
    top: 6px;
    right: 6px;
    background-color: #f0f0f0;
    border: 1px solid #ccc;
    border-radius: 4px;
    padding: 4px 8px;
    font-size: 12px;
    cursor: pointer;
    opacity: 0.7;
    transition: opacity 0.3s ease;
  }
  .copy-btn:hover {
    opacity: 1;
  }
</style>

<script>
document.addEventListener("DOMContentLoaded", function() {
  document.querySelectorAll("pre").forEach(function(pre) {
    const button = document.createElement("button");
    button.className = "copy-btn";
    button.textContent = "Copy";
    pre.style.position = "relative";
    pre.appendChild(button);

    button.addEventListener("click", function() {
      const code = pre.querySelector("code");
      if (!code) return;
      navigator.clipboard.writeText(code.innerText).then(() => {
        button.textContent = "Copied!";
        setTimeout(() => {
          button.textContent = "Copy";
        }, 1500);
      });
    });
  });
});
</script>
')
```

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Key Components of the Package

-   <strong><u>CCVA Misclassification Matrices</u></strong>: Stored as `CCVA_missmat`, this is the inventory of uncertainty-quantified misclassification matrices for three computer-coded verbal autopsy (CCVA) algorithms: expert algorithm or [EAVA](https://pubmed.ncbi.nlm.nih.gov/25969734/), [InSilicoVA](https://doi.org/10.1080/01621459.2016.1152191), and [InterVA](https://doi.org/10.3402/gha.v5i0.19281). The matrices are derived using the modeling framework of [Pramanik et al. (2025)](https://doi.org/10.1214/24-AOAS2006) applied to the data collected in the Child Health and Mortality Prevention Surveillance ([CHAMPS](https://champshealth.org/)) project (see [Pramanik et al. (2026)](https://doi.org/10.1136/bmjgh-2025-021747) for details). Due to file size limits, the posterior samples are hosted on the [GitHub repository](https://github.com/sandy-pramanik/CCVA-Misclassification-Matrices), with the .rda file available under the [release](https://github.com/sandy-pramanik/CCVA-Misclassification-Matrices/releases/tag/20241004). Please refer to this package and the GitHub repository for future updates.

-   <strong><u>Example VA-only Data from COMSA-Mozambique</u></strong>: Stored as `comsamoz_CCVAoutput`, this object contains CCVA analyses outputs of [publicly available](https://comsamozambique.org/data-access) verbal autopsy (VA) survey data in [COMSA-Mozambique](https://comsamozambique.org/) for children under age five. It includes outputs from EAVA (using the [EAVA](https://CRAN.R-project.org/package=EAVA) package), and InSilicoVA and InterVA (using the [openVA](https://CRAN.R-project.org/package=openVA) package). The analyses cover two age groups: neonates (0-27 days) and children (1-59 months).

-   <strong>`vacalibration()`</strong>: This is the main function for performing calibration. For EAVA, InSilicoVA, and InterVA, it directly takes outputs from [EAVA](https://CRAN.R-project.org/package=EAVA) and [openVA](https://CRAN.R-project.org/package=openVA), and produces calibrated estimates of cause-specific mortality fractions (CSMFs). More generally, this calibrates population-level prevalence derived from single-class predictions of discrete classifiers. For this, users need to provide fixed or uncertainty-quantified misclassification matrices.

-   <strong>`plot_vacalib()`</strong>: It presents a figure including the misclassification matrix used for calibration, and comparing uncalibrated and calibrated estimates of CSMFs.

# Getting Started

We start by installing and loading the `vacalibration` package in `R`. 

Install from CRAN:
```{r, eval=F}
install.packages("vacalibration")
library(vacalibration) # load
```

Install from GitHub:
```{r, eval=F}
# install "devtools" R package
devtools::install_github("sandy-pramanik/vacalibration")
library(vacalibration) # load
```

[Back to top](#top)

# Example: COMSA-Mozambique Data

For illustration, we use the VA-only data included in this package. Stored as `comsamoz_CCVAoutput`, it contains outputs from EAVA, InSilicoVA, and InterVA of the analysis of the [publicly available](https://comsamozambique.org/data-access) VA-only data for children under age 5 in [COMSA-Mozambique](https://comsamozambique.org/). It can be loaded with:

```{r, eval=F}
data("comsamoz_CCVAoutput")

comsamoz_CCVAoutput$neonate$eava  # output from EAVA for neonates
comsamoz_CCVAoutput$neonate$insilicova  # output from InSilicoVA for neonates
comsamoz_CCVAoutput$neonate  # list of outputs for neonates from EAVA, InSilicoVA, and InterVA
```

Outputs for children can be similarly accessed as `comsamoz_CCVAoutput$child`.

[Back to top](#top)

# CCVA Misclassification Matrices {#sec-CCVA_missmat}

This is the inventory of uncertainty-quantified misclassification matrices for the CCVA algorithms EAVA, InSilicoVA, and InterVA. When applying these algorithms, the matrices enable VA-Calibration to obtain calibrated CSMF estimates. The matrices are estimated using the misclassification-matrix modeling framework of [Pramanik et al. (2025)](https://doi.org/10.1214/24-AOAS2006) and paired CHAMPS–VA cause-of-death data from the Child Health and Mortality Prevention Surveillance ([CHAMPS](https://champshealth.org/)) project (see [Pramanik et al. (2026)](https://doi.org/10.1136/bmjgh-2025-021747) for details). It can be loaded with:

```{r, eval=F}
data("CCVA_missmat")
```

For EAVA among neonates in Mozambique, you can access: the average misclassification matrix, the uncertainty-quantified misclassification matrix as a Dirichlet prior, and distributional summaries, as follows:

```{r, eval=F}
CCVA_missmat$neonate$eava$postmean$Mozambique  # average
CCVA_missmat$neonate$eava$asDirich$Mozambique  # Dirichlet approximation
CCVA_missmat$neonate$eava$postsumm$Mozambique  # summary of distribution
```

Matrices for other algorithms, countries, and child age groups can be accessed in the same way. Currently, `CCVA_missmat` provides misclassification matrices for three CCVA algorithms (`EAVA`, `InSilicoVA`, and `InterVA`) and two age groups (`neonates` aged 0-27 days, and `child` aged 1-59 months) across countries (specific estimates for `Bangladesh`, `Ethiopia`, `Kenya`, `Mali`, `Mozambique`, `Sierra Leone`, and `South Africa`, and a combined estimate for all other countries as `other`), enabling global calibration.

For each age group, misclassification matrices are provided for the following broad causes:

-   <strong><u>Neonates</u></strong>: `"congenital_malformation"`, `"pneumonia"`, `"sepsis_meningitis_inf"` (sepsis/meningitis/infections), `"ipre"` (intrapartum-related events), `"other"`, and `"prematurity"`.
-   <strong><u>Children</u></strong>: `"malaria"`, `"pneumonia"`, `"diarrhea"`, `"severe_malnutrition"`, `"hiv"`, `"injury"`, `"other"`, `"other_infections"`, and `"nn_causes"` (neonatal causes consisting of IPRE, congenital malformation, and prematurity).

If misclassification matrices are available for the age group, algorithm, and country of interest, users only need to provide `va_data` with algorithm name, `age_group`, and `country`, and `vacalibration()` automatically fetches the appropriate misclassification matrix from `CCVA_missmat`. If no matching matrices are available, users must provide them (see the `missmat` argument in `vacalibration()` for details).

This function also supports posterior samples of misclassification matrices, such as those included in `CCVA_missmat` (available from the [GitHub repository](https://github.com/sandy-pramanik/CCVA-Misclassification-Matrices)). For the example above, the samples can be accessed as `CCVA_missmat$neonate$eava$postsamples$Mozambique`.

[Back to top](#top)

# Implementing VA-Calibration

In the following example, we demonstrate how `vacalibration()` can be used to perform algorithm-specific and ensemble calibrations, and generate calibrated CSMF estimates. For brevity, we exclude the diagnostic and summary plots as well as the detailed output of the posterior sampling.

[Back to top](#top)

## Integration with VA Workflow

### Algorithm-Specific

Below is an example of EAVA-specific VA-Calibration for neonates in Mozambique:

```{r, eval=F, results = 'hide', message = FALSE, warning = FALSE, fig.show = "hide"}
vacalib_eava = vacalibration(va_data = list("eava" = comsamoz_CCVAoutput$neonate$eava), 
                             age_group = "neonate", country = "Mozambique")

# CSMF
vacalib_eava$p_uncalib[1,]  # uncalibrated estimates
vacalib_eava$p_calib[1,,]  # posterior of calibrated estimates
vacalib_eava$pcalib_postsumm[1,,]  # posterior summary of calibrated estimates

# death counts
vacalib_eava$va_deaths_uncalib[1,]  # uncalibrated
vacalib_eava$va_deaths_calib_algo[1,]  # calibrated
```

InSilicoVA and InterVA-specific VA-Calibration can be similarly performed by replacing `va_data = list("insilicova" = comsamoz_CCVAoutput$neonate$insilicova)` and `va_data = list("interva" = comsamoz_CCVAoutput$neonate$interva)`.

Use `missmat_type` to control uncertainty propagation. `missmat_type = "fixed"` calibrates using a fixed misclassification matrix (by default, the average matrix in `CCVA_missmat`) and does not propagate uncertainty. `missmat_type = "prior"` (package default) or `missmat_type = "samples"` propagates uncertainty and is recommended.

To calibrate with posterior samples, use `missmat_type = "samples"` and `missmat = CCVA_missmat$neonate$eava$postsamples$Mozambique` in the example. Note: `CCVA_missmat` included in the package does not contain posterior samples due to file size limits. If needed, obtain them from the `CCVA_missmat` object in the [GitHub repository](https://github.com/sandy-pramanik/CCVA-Misclassification-Matrices) and pass them to `vacalibration()`.

[Back to top](#top)

### Ensemble

To perform ensemble calibration, provide a list algorithm-specific CCVA outputs. This performs both algorithm-specific calibration and an ensemble calibration. Set `ensemble = FALSE` to turn off ensemble calibration.

```{r, eval=F, results = 'hide', message = FALSE, warning = FALSE, fig.show = "hide"}
vacalib_ensemble = 
  vacalibration(va_data = list("eava" = comsamoz_CCVAoutput$neonate$eava,
                               "insilicova" = comsamoz_CCVAoutput$neonate$insilicova,
                               "interva" = comsamoz_CCVAoutput$neonate$interva),
                age_group = "neonate", country = "Mozambique")

# CSMF
vacalib_ensemble$p_uncalib  # uncalibrated estimates

# posterior of calibrated CSMF
vacalib_ensemble$p_calib["eava",,]  # EAVA
vacalib_ensemble$p_calib["insilicova",,]  # InSilicoVA
vacalib_ensemble$p_calib["interva",,]  # InterVA
vacalib_ensemble$p_calib["ensemble",,]  # ensemble

# posterior summary of calibrated CSMF
vacalib_ensemble$pcalib_postsumm["eava",,]  # EAVA
vacalib_ensemble$pcalib_postsumm["insilicova",,]  # InSilicoVA
vacalib_ensemble$pcalib_postsumm["interva",,]  # InterVA
vacalib_ensemble$pcalib_postsumm["ensemble",,]  # ensemble

# death counts
vacalib_ensemble$va_deaths_uncalib  # uncalibrated
vacalib_ensemble$va_deaths_calib_algo  # calibrated counts from algorithm-specific calibration
vacalib_ensemble$va_deaths_calib_ensemble  # calibrated counts from ensemble calibration
```

If `missmat` includes user-specified matrices, then `age_group` and `country` are not required.

Calibration for children can be performed similarly.

[Back to top](#top)

### Visualization

The output of the `vacalibration()` function can be directly passed to `plot_vacalib()` to generate a plot that summarizes the main components of VA-Calibration. By default, it displays the misclassification matrix used for calibration and shows both the uncalibrated and calibrated CSMF estimates. For instance, when calibrating for EAVA as demonstrated above, the summary plot can be obtained as:

```{r, eval=F, results = 'hide', message = FALSE, warning = FALSE, fig.show = "hide"}
plot_vacalib(vacalib_fit = vacalib_eava)
```
```{r, echo=FALSE, out.width="100%"}
knitr::include_graphics("figures/vacalib_eava_plot.png")
```
Grey rows and columns in the misclassification matrix indicate causes that are not calibrated. Set `toplot="missmat"` or `csmf` in `plot_vacalib()` to only plot the misclassification matrix or the comparison of CSMF estimates. This similarly applies to InSilicoVA, InterVA, and ensemble VA-Calibration. The plotted misclassification matrix depends on the `missmat_type` specified in `vacalibration()`. If `missmat_type="fixed"`, the fixed misclassification matrix used in calibration is plotted. If `missmat_type` equals `"prior"` or `"samples"`, the average misclassification matrix is displayed.

[Back to top](#top)

## Causes Outside CHAMPS Broad Causes

As discussed in [CCVA Misclassification Matrices](#sec-CCVA_missmat), the matrices in `CCVA_missmat` are available for CHAMPS broad causes. In cases where the causes in `va_data` are not a subset of the CHAMPS broad causes, a cause-mapping step is required. One such application is the [CA CODE](https://childmortality.org/about) project, which compiles VA-based death counts across multiple countries. For example, a study in Bangladesh analyzed 302 neonatal deaths using EAVA, and reported 82 deaths due to *Intrapartum*, 17 due to *Congenital*, 6 due to *Diarrhoeal*, 33 due to *LRI*, 108 due to *Sepsis*, 35 due to *Preterm*, 14 due to *Tetanus*, and 7 due to *Other*.

In such cases, `vacalibration()` requires specifying `studycause_map`, a mapping from the study causes to the CHAMPS broad causes. For this example, following expert guidance, we define:

```{r, eval=F, results = 'hide', message = FALSE, warning = FALSE, fig.show = "hide"}
set_studycause_map = c("Intrapartum" = "ipre", "Congenital" = "congenital_malformation",
                       "Diarrhoeal" = "sepsis_meningitis_inf", "LRI" = "pneumonia",
                       "Sepsis" = "sepsis_meningitis_inf", "Preterm" = "prematurity", 
                       "Tetanus" = "sepsis_meningitis_inf", "Other" = "other")
```

This mapping converts the misclassification matrices in `CCVA_missmat` to align with the study causes, enabling VA-Calibration. This can then be implemented as:

```{r, eval=F, results = 'hide', message = FALSE, warning = FALSE, fig.show = "hide"}


vacalib_cacode = vacalibration(va_data = list("eava" = c("Intrapartum" = 82, "Congenital" = 17,
                                                         "Diarrhoeal" = 6, "LRI" = 33,
                                                         "Sepsis" = 108, "Preterm" = 35, 
                                                         "Tetanus" = 14, "Other" = 7)), 
                               age_group = "neonate", country = "Bangladesh",
                               studycause_map = set_studycause_map)

# CSMF
vacalib_cacode$p_uncalib[1,]  # uncalibrated estimates
vacalib_cacode$p_calib[1,,]  # posterior of calibrated estimates
vacalib_cacode$pcalib_postsumm[1,,]  # posterior summary of calibrated estimates

# death counts
vacalib_cacode$va_deaths_uncalib[1,]  # uncalibrated
vacalib_cacode$va_deaths_calib_algo[1,]  # calibrated
```

This is required only when using the misclassification matrices from `CCVA_missmat`. If `missmat` includes user-specified matrices, then `age_group`, `country`, and `studycause_map` are not required. Like in algorithm-specific calibration described above, `vacalib_cacode` can be similarly input into `plot_vacalib()` to generate the summary plot of the misclassification matrix and CSMF estimates.

[Back to top](#top)