--- title: "Recoding CHMS medication variables" format: html vignette: > %\VignetteIndexEntry{Recoding CHMS medication variables} %\VignetteEngine{quarto::html} %\VignetteEncoding{UTF-8} --- ```{r results='hide', message=FALSE, warning=FALSE, eval=TRUE} library(chmsflow) library(recodeflow) library(dplyr) ``` ## 1. Introduction chmsflow provides 16 functions that classify medications from ATC codes recorded in CHMS clinic data. Each function checks whether a respondent is taking a specific drug class and returns `1` (yes) or `0` (no), with `haven::tagged_na()` codes for missing or not-applicable responses. ### Available medication variables | Variable | Drug class | ATC prefix | Cycles 3--6 function | Cycles 1--2 function | |----------|------------|------------|----------------------|----------------------| | `ace_med` | ACE inhibitors | C09 | `is_ace_inhibitor()` | `is_ace_med_cycles1to2()` | | `bb_med` | Beta blockers | C07 | `is_beta_blocker()` | `is_bb_med_cycles1to2()` | | `ccb_med` | Calcium channel blockers | C08 | `is_calcium_channel_blocker()` | `is_ccb_med_cycles1to2()` | | `diur_med` | Diuretics | C03 | `is_diuretic()` | `is_diur_med_cycles1to2()` | | `misc_htn_med` | Other antihypertensives | mixed | `is_other_antihtn_med()` | `is_misc_htn_med_cycles1to2()` | | `any_htn_med` | Any antihypertensive | combined | `is_any_antihtn_med()` | `is_any_htn_med_cycles1to2()` | | `nsaid_med` | NSAIDs | M01A | `is_nsaid()` | `is_nsaid_med_cycles1to2()` | | `diab_med` | Diabetes medications | A10 | `is_diabetes_med()` | `is_diab_med_cycles1to2()` | ### Cycle differences Medication data is structured differently across CHMS cycles: - **Cycles 1--2** store medications in a flat format with up to 80 individual columns (`atc_101a` to `atc_235a` for ATC codes, `mhr_101b` to `mhr_235b` for time last taken). The cycles 1--2 wrapper functions accept all of these columns as parameters. - **Cycles 3--6** store medications in a multi-row format with two variables per row: `meucatc` (ATC code) and `npi_25b` (time last taken). Each respondent may have multiple rows. After recoding, results must be aggregated by `clinicid`. ## 2. When to use medication recoding If your analysis requires medication variables, ***always perform medication recoding first***, before recoding any other variables. Two downstream health outcome variables depend on medication status: - **Hypertension** -- `any_htn_med` must be merged into the main cycle dataset before deriving hypertension outcomes. - **Diabetes** -- `diab_med` must be merged before deriving diabetes outcomes. ## 3. Workflow The workflow is the same for all cycles: recode medication variables and merge into the main cycle dataset using `recode_meds_cycles1to2()` or `recode_meds_cycles3to6()`, then derive health outcomes using `recode_after_meds()`. Use `recode_after_meds()` instead of `rec_with_table()` -- it automatically excludes medication-specific rows from `variable_details` so pre-computed medication columns are passed through rather than re-derived. ### 3.1 Cycles 1--2 Cycles 1--2 medication data uses uppercase column names (`CLINICID`, `ATC_101A`, etc.). `recode_meds_cycles1to2()` normalizes these internally. **Step 1** -- Recode medication variables and merge with main cycle data. Requires: `cycle1`, `cycle1_meds`. ```{r, warning=FALSE} cycle1 <- recode_meds_cycles1to2(cycle1, cycle1_meds, c("any_htn_med", "diab_med")) ``` **Step 2** -- Derive diabetes status. Requires: `cycle1` from Step 1. ```{r, warning=FALSE} cycle1_diab_data <- recode_after_meds( cycle1, c("lab_hba1", "diab_a1c", "diab_med", "ccc_51", "diab_status") ) head(select(cycle1_diab_data, clinicid, diab_status)) ``` **Step 3** -- Derive hypertension status. Requires: `cycle1` from Step 1. ```{r, warning=FALSE} cycle1_htn_data <- recode_after_meds( cycle1, c( # Blood pressure (raw + adjusted) "bpmdpbps", "bpmdpbpd", "sbp_adj_mmhg", "dbp_adj_mmhg", # Medication inputs (merged in Step 1) "any_htn_med", "ccc_32", # Diabetes chain (input to htn functions) "lab_hba1", "diab_a1c", "ccc_51", "diab_med", "diab_status", # CVD chain "ccc_61", "ccc_63", "ccc_81", "cvd_status", # CKD chain "lab_bcre", "pgdcgt", "clc_sex", "clc_age", "gfr_ml_min", "ckd_status", # Hypertension outcomes "htn_status", "htn_adj_status", "htn_control_status", "htn_control_adj_status" ) ) head(select(cycle1_htn_data, clinicid, htn_status, htn_adj_status)) ``` ### 3.2 Cycles 3--6 **Step 1** -- Recode medication variables and merge with main cycle data. Requires: `cycle3`, `cycle3_meds`. ```{r, warning=FALSE} cycle3 <- recode_meds_cycles3to6(cycle3, cycle3_meds, c("any_htn_med", "diab_med")) ``` **Step 2** -- Derive diabetes status. Requires: `cycle3` from Step 1. ```{r, warning=FALSE} cycle3_diab_data <- recode_after_meds( cycle3, c("lab_hba1", "diab_a1c", "diab_med", "ccc_51", "diab_status") ) head(select(cycle3_diab_data, clinicid, diab_status)) ``` **Step 3** -- Derive hypertension status. Requires: `cycle3` from Step 1. `cvd_status`, `diab_status`, and `ckd_status` are intermediate inputs to the hypertension functions. Their full input chains must also be listed so `recode_after_meds()` can derive them. ```{r, warning=FALSE} cycle3_htn_data <- recode_after_meds( cycle3, c( # Blood pressure (raw + adjusted) "bpmdpbps", "bpmdpbpd", "sbp_adj_mmhg", "dbp_adj_mmhg", # Medication inputs (merged in Step 1) "any_htn_med", "ccc_32", # Diabetes chain (input to htn functions) "lab_hba1", "diab_a1c", "ccc_51", "diab_med", "diab_status", # CVD chain "ccc_61", "ccc_63", "ccc_81", "cvd_status", # CKD chain "lab_bcre", "pgdcgt", "clc_sex", "clc_age", "gfr_ml_min", "ckd_status", # Hypertension outcomes "htn_status", "htn_adj_status", "htn_control_status", "htn_control_adj_status" ) ) head(select(cycle3_htn_data, clinicid, htn_status, htn_adj_status)) ``` ## 4. Advanced: using individual classification functions The `is_*` functions underlie the wrapper functions and are available directly for custom workflows -- for example, deriving a single drug class without the full pipeline, or integrating classification logic into your own aggregation steps. Each function accepts an ATC code and a time-last-taken value and returns `1`, `0`, or a `tagged_na()` code: ```{r, warning=FALSE} # Single medication classification is_beta_blocker("C07AA05", 1) # returns 1 is_ace_inhibitor("C09AA02", 1) # returns 1 is_diabetes_med("A10BA02", 1) # returns 1 ``` ### Cycle format differences **Cycles 1--2** -- one row per respondent with up to 80 `atc_*/mhr_*` column pairs. The `is_*_med_cycles1to2()` variants accept named arguments for each slot: ```{r, warning=FALSE} # Classification using cycles 1--2 wide-format columns is_ace_med_cycles1to2(atc_101a = "C09AA02", mhr_101b = 1) # returns 1 is_ace_med_cycles1to2(atc_101a = "C09AA02", mhr_101b = 6) # returns 0 (not taken recently) ``` **Cycles 3--6** -- one row per medication per respondent with two columns: `meucatc` (ATC code) and `npi_25b` (time last taken). Classify per row, then aggregate across rows per respondent: ```{r, warning=FALSE} cycle3_meds |> mutate(ace_med = is_ace_inhibitor(meucatc, npi_25b)) |> aggregate_meds_by_person(variables = "ace_med") ``` ::: {.callout-warning} Avoid using `as.numeric(as.character(.x))` to aggregate medication columns. That pattern strips `tagged_na("a")` (valid skip) and `tagged_na("b")` (missing/refused) distinctions, collapsing them into plain `NA`. Use `aggregate_meds_by_person()` instead -- it preserves tagged-NA semantics across the aggregation. ::: ## Next steps - **Full analysis example** -- See how medication recoding fits into an end-to-end workflow in [Analysis walkthrough](analysis_walkthrough.html). - **Understand missing data** -- Learn how `tagged_na("a")` and `tagged_na("b")` are preserved through the medication pipeline in [Missing data (tagged_na)](tagged_na_usage.html). - **Inspect the metadata** -- See how medication variables are defined in `variable-details.csv` in [Variable schema reference](variables_and_variable_details.html). - **Work at an RDC** -- For loading real CHMS medication data at a Research Data Centre, see [Using chmsflow at an RDC](using_chmsflow_at_an_rdc.html).