--- title: "Getting Started with immunogenetr" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with immunogenetr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Overview immunogenetr is a comprehensive toolkit for clinical HLA informatics, built on tidyverse principles. It uses the genotype list string (GL string, ) as its core data structure for storing and computing HLA genotype data. This vignette walks through the main workflows: 1. Converting tabular HLA data to GL strings 2. Splitting GL strings back into individual loci 3. Calculating mismatches between recipient and donor 4. Summarizing HLA matching for transplantation 5. Working with HLA allele names (truncation, prefixes, regex) 6. Reading HML files ## Setup ```{r message=FALSE} library(immunogenetr) library(dplyr) ``` ## Converting tabular HLA data to GL strings Clinical HLA data is typically stored in a tabular format, with each allele in its own column. immunogenetr includes the `HLA_typing_1` dataset as an example: ```{r} # HLA_typing_1 contains typing for 10 individuals across all classical HLA loci. head(HLA_typing_1, 3) ``` The `HLA_columns_to_GLstring()` function converts these columns into a single GL string per individual. When used inside `mutate()`, pass `.` as the first argument to reference the working data frame: ```{r} HLA_typing_GL <- HLA_typing_1 %>% # Convert all typing columns (A1 through DPB1_2) into a GL string. mutate( GL_string = HLA_columns_to_GLstring(., HLA_typing_columns = A1:DPB1_2), .after = patient ) %>% # Keep only patient ID and the new GL string column. select(patient, GL_string) # View the GL strings. (HLA_typing_GL) ``` Each GL string encodes the full genotype: alleles within a gene copy are separated by `/` (ambiguity), gene copies by `+`, and loci by `^`. ## Splitting GL strings into loci To go the other direction, `GLstring_genes()` splits a GL string back into separate columns by locus: ```{r} # Take the first patient's GL string and split it into locus columns. # Note: GLstring_genes and GLstring_genes_expanded use pivot_longer on all # columns, so only pass the GL string column (no other data types). single_patient <- HLA_typing_GL[1, "GL_string", drop = FALSE] GLstring_genes(single_patient, "GL_string") ``` For a fully expanded view with one allele per row, use `GLstring_genes_expanded()`: ```{r} GLstring_genes_expanded(single_patient, "GL_string") ``` ## Calculating HLA mismatches The mismatch functions are the core of immunogenetr. They all take a recipient GL string, a donor GL string, one or more loci, and a direction. Let's set up a recipient/donor pair: ```{r} # Patient 7 is the recipient, patient 9 is the donor. recip_gl <- HLA_typing_GL %>% filter(patient == 7) %>% pull(GL_string) donor_gl <- HLA_typing_GL %>% filter(patient == 9) %>% pull(GL_string) ``` ### Is there a mismatch? (`HLA_mismatch_logical`) ```{r} # Check if there is an HLA-A mismatch in the graft-vs-host direction. HLA_mismatch_logical(recip_gl, donor_gl, "HLA-A", direction = "GvH") # Check host-vs-graft direction. HLA_mismatch_logical(recip_gl, donor_gl, "HLA-A", direction = "HvG") ``` ### How many mismatches? (`HLA_mismatch_number`) ```{r} # Count bidirectional mismatches across several loci at once. HLA_mismatch_number( recip_gl, donor_gl, c("HLA-A", "HLA-B", "HLA-C", "HLA-DRB1"), direction = "bidirectional" ) ``` ### Which alleles are mismatched? (`HLA_mismatched_alleles`) ```{r} # Identify the specific mismatched alleles in the HvG direction. HLA_mismatched_alleles(recip_gl, donor_gl, "HLA-A", direction = "HvG") ``` ### Match count (`HLA_match_number`) ```{r} # Count the number of matches (complement of mismatches). HLA_match_number( recip_gl, donor_gl, c("HLA-A", "HLA-B", "HLA-C", "HLA-DRB1"), direction = "bidirectional" ) ``` ## HLA match summaries for transplantation The `HLA_match_summary_HCT()` function provides standard match grades used in hematopoietic cell transplantation: ```{r} # X-of-8 matching (A, B, C, DRB1 bidirectional). HLA_match_summary_HCT(recip_gl, donor_gl, direction = "bidirectional", match_grade = "Xof8" ) # X-of-10 matching (adds DQB1). HLA_match_summary_HCT(recip_gl, donor_gl, direction = "bidirectional", match_grade = "Xof10" ) ``` ### Finding the best donor A common workflow is comparing one recipient against multiple potential donors: ```{r} # Patient 3 is the recipient; compare against all 10 donors. recipient <- HLA_typing_GL %>% filter(patient == 3) %>% select(GL_string) %>% rename(GL_string_recip = GL_string) donors <- HLA_typing_GL %>% rename(GL_string_donor = GL_string, donor = patient) %>% # Cross-join to pair recipient with each donor. cross_join(recipient) %>% # Calculate 8/8 match grade for each pair. mutate( match_8of8 = HLA_match_summary_HCT( GL_string_recip, GL_string_donor, direction = "bidirectional", match_grade = "Xof8" ), .after = donor ) %>% # Sort best matches first. arrange(desc(match_8of8)) donors %>% select(donor, match_8of8) ``` ## Working with HLA allele names ### Truncation `HLA_truncate()` reduces allele resolution to a specified number of fields: ```{r} # Truncate a four-field allele to two fields. HLA_truncate("HLA-A*02:01:01:01", fields = 2) # Works on full GL strings too. HLA_truncate("HLA-A*02:01:01:01+HLA-A*03:01:01:02^HLA-B*07:02:01:01+HLA-B*44:02:01:01", fields = 2 ) ``` ### Prefix management `HLA_prefix_remove()` and `HLA_prefix_add()` manage the `HLA-` and locus prefixes: ```{r} # Remove all prefixes to get just the allele fields. HLA_prefix_remove("HLA-A*02:01") # Keep the locus designation but remove "HLA-". HLA_prefix_remove("HLA-A*02:01", keep_locus = TRUE) # Add the full prefix back. HLA_prefix_add("02:01", "HLA-A*") # "HLA-" is added by default. HLA_prefix_add("A*02:01") ``` ### Regex for GL string searching `GLstring_regex()` creates regex patterns that accurately search within GL strings, preventing partial matches across field boundaries: ```{r} gl <- "HLA-A*02:01:01+HLA-A*68:01^HLA-B*07:01+HLA-B*15:01" # A two-field search correctly matches the three-field allele. pattern <- GLstring_regex("HLA-A*02:01") stringr::str_detect(gl, pattern) # But won't falsely match a longer allele number. stringr::str_detect("HLA-A*02:149:01", GLstring_regex("HLA-A*02:14")) ``` ## Column name repair When working in the tidyverse, column names with dashes and asterisks are inconvenient. `HLA_column_repair()` converts between WHO-standard (`HLA-A*`) and tidyverse-friendly (`HLA_A`) formats: ```{r} # GLstring_genes returns tidyverse-friendly names by default. repaired <- GLstring_genes(single_patient, "GL_string") names(repaired) # Convert back to WHO format with asterisks. who_names <- HLA_column_repair(repaired, format = "WHO", asterisk = TRUE) names(who_names) ``` ## Reading HML files The `read_HML()` function extracts GL strings from HML (HLA Markup Language) files, which are a standard format for reporting HLA typing results from next-generation sequencing: ```{r} # immunogenetr ships with two example HML files. hml_path <- system.file("extdata", "HML_1.hml", package = "immunogenetr") hml_result <- read_HML(hml_path) hml_result ``` ## Disclaimer This library is intended for research use. Any application making use of this package in a clinical setting will need to be independently validated according to local regulations.