--- title: "Using the convert argument" author: "Thierry Onkelinx" output: rmarkdown::html_vignette: fig_caption: yes vignette: > %\VignetteIndexEntry{Using the convert argument} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## Introduction The `convert` argument in `write_vc()` and `read_vc()` allows you to apply transformations to data columns during the write and read operations. This is useful when you want to store data types that `git2rdata` doesn't support. The only requirement is that there exist two functions in some R package that do the transformation. One function should convert the unsupported data type into a supported data type. The second function should revert the supported data type into the original unsupported data type. ## Basic usage The `convert` argument is a named list where: - Names correspond to column names in your data frame - Each element is a character vector of length 2 with names `write` and `read` - Functions are specified in the format `"package::function"` ```{r setup} library(git2rdata) root <- tempfile("git2rdata-convert") dir.create(root) ``` ## Example: case conversion A simple example is converting text to uppercase for storage while keeping it lowercase in R: ```{r case-conversion} # Create sample data data <- data.frame( id = 1:3, name = c("alice", "bob", "charlie"), stringsAsFactors = FALSE ) # Write with case conversion write_vc( data, file = "people", root = root, sorting = "id", convert = list( name = c( write = "base::toupper", # Convert to uppercase when writing read = "base::tolower" # Convert to lowercase when reading ) ) ) ``` The stored file contains the names in uppercase: ```{r check-storage} # Check the raw file content raw_content <- readLines(file.path(root, "people.tsv")) cat(raw_content, sep = "\n") ``` When reading the data back, the conversion is automatically applied: ```{r read-back} # Read the data back result <- read_vc("people", root = root) print(result) # The convert specification is stored in the attributes attr(result, "convert") ``` ## Multiple columns You can apply conversions to multiple columns: ```{r multiple-columns} data2 <- data.frame( id = 1:2, first_name = c("alice", "bob"), last_name = c("smith", "jones"), stringsAsFactors = FALSE ) write_vc( data2, file = "names", root = root, sorting = "id", convert = list( first_name = c(write = "base::toupper", read = "base::tolower"), last_name = c(write = "base::toupper", read = "base::tolower") ) ) result2 <- read_vc("names", root = root) print(result2) ``` ## Use cases ### Unsupported data type `git2rdata` doesn't have support for 64-bit integers. You can store them by converting them into a character. ```{r unsupported, eval = FALSE} mtcars2 <- mtcars |> dplyr::mutate(cyl = bit64::as.integer64(cyl)) write_vc( mtcars2, file = "mtcars2", convert = list( cyl = c(write = "bit64::as.character", read = "bit64::as.integer64") ) ) ``` ### Storage optimization Convert numeric data to a more compact string representation: ```{r numeric-conversion, eval=FALSE} # Example with custom conversion functions # (requires defining custom functions in a package) write_vc( data, file = "data", root = root, sorting = "id", convert = list( large_number = c( write = "mypackage::to_scientific", read = "mypackage::from_scientific" ) ) ) ``` ### Data standardization Ensure consistent formatting across different data sources: ```{r standardization, eval=FALSE} # Convert dates to ISO format write_vc( data, file = "events", root = root, sorting = "id", convert = list( event_date = c( write = "mypackage::to_iso_date", read = "mypackage::from_iso_date" ) ) ) ``` ## Important notes - **Package availability**: All packages referenced in the `convert` argument must be available when calling `write_vc()` and `read_vc()`. The function checks for package availability at read and write time. - **Function validation**: The function validates that the specified functions exist in the specified packages. - **Metadata storage**: Conversion specifications are stored in the metadata YAML file, ensuring that `read_vc()` knows how to reverse the transformations. - **Strict mode**: When updating existing files, changes to the `convert` argument are detected by `compare_meta()` and will trigger an error in strict mode or a warning in non-strict mode. ## Limitations - The `convert` argument only accepts functions in the `package::function` format. Anonymous functions or functions from the global environment are not supported. - Conversions must be reversible. The `read` function should be able to restore the original data from the converted form. - The conversion is applied before `meta()` processes the data, so optimizations (like factor encoding) work on the converted data. ```{r cleanup, include=FALSE} unlink(root, recursive = TRUE) ```