--- title: "Getting started with rocrateR" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting started with rocrateR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(rocrateR) ``` ## Introduction Reproducible research requires more than sharing files. We also need structured metadata describing: * What the files contain * Who created them * How they were produced * What software was used * How components relate to each other * RO-Crate is a lightweight standard for packaging research outputs with rich, machine-readable metadata. * `{rocrateR}` lets you create and manage RO-Crates directly from R. ## What is an RO-Crate? An RO-Crate is: * A folder * Containing research files * Plus a metadata file: ro-crate-metadata.json The metadata describes all files and their relationships using a graph model. ## RO-Crate Structure Example: ``` my_crate/ ├── ro-crate-metadata.json ├── data/ │ └── results.csv └── analysis.R ``` * Files are the research artefacts * Metadata links everything together -------------------------------------------------------------------------------- ## 1. Functions Overview | Function | Purpose | |-----------|----------| | `rocrate()` | Create an empty or initialized RO-Crate | | `entity()` | Define a new entity (Person, Dataset, etc.) | | `add_entity()` / `add_entities()` | Add entities to a crate. Note that `add_entities()` is now deprecated and `add_entity()` is preferred. | | `get_entity()` | Retrieve entities by `@id` or `@type` | | `remove_entity()` / `remove_entities()` | Remove one or more entities. Note that `remove_entities()` is now deprecated and `remove_entity()` is preferred. | | `load_rocrate()` | Higher level function that loads an RO-Crate from metadata file, crate directory or BagIt archive | | `write_rocrate()` | Save RO-Crate to disk | | `bag_rocrate()` / `is_rocrate_bag()` / `unbag_rocrate()` | Bagging and unbagging RO-Crates | | `validate_rocrate()` | Validate RO-Crate and generate report | ## 2. First RO-Crate The following command creates an RO-Crate Metadata descriptor (`ro-crate-metadata.json`). This should be stored inside the root (`./`) of your RO-Crate. ```{r example} # library(rocrateR) my_first_ro_crate <- rocrateR::rocrate() ``` This object is a list with the basic components of an RO-Crate. It can be visualised in the console as follows: ```{r} my_first_ro_crate ``` This object can be saved to disk using the following command: ```{r, eval = FALSE} my_first_ro_crate |> rocrateR::write_rocrate("/path/to/ro-crate/ro-crate-metadata.json") ``` For example, using a temporary directory: ```{r} tmp <- file.path(tempdir(), "ro-crate-metadata.json") my_first_ro_crate |> rocrateR::write_rocrate(tmp) # load lines / flat file readLines(tmp) # delete temporary file unlink(tmp) ``` ## 3. Including additional entities In the previous section we created a very basic RO-Crate with the `rocrateR::rocrate()` function; however, you are likely to include additional entities in your RO-Crate. Entities must contain at least two components `@id` and `@type` (see [https://w3id.org/ro/crate/1.2/](https://w3id.org/ro/crate/1.2/) for details). For example, a contextual entity can be defined as follows: ```{r} # create entity for an organisation organisation_uol <- rocrateR::entity( id = "https://ror.org/04xs57h96", type = "Organization", name = "University of Liverpool", url = "http://www.liv.ac.uk" ) # create an entity for a person person_rvd <- rocrateR::entity( id = "https://orcid.org/0000-0001-5036-8661", type = "Person", name = "Roberto Villegas-Diaz" ) ``` These entities can be attached to an RO-Crate using the `rocrateR::add_entity()` function: ```{r} my_second_ro_crate <- rocrateR::rocrate() |> rocrateR::add_entity(person_rvd) |> rocrateR::add_entity_value( id = "./", key = "author", value = list(`@id` = person_rvd$`@id`) ) |> rocrateR::add_entity(organisation_uol) |> rocrateR::add_entity_value( id = "https://orcid.org/0000-0001-5036-8661", key = "affiliation", value = list(`@id` = organisation_uol$`@id`) ) ``` Alternatively, the same result can be achieved with the following code: ```{r, eval = FALSE} my_second_ro_crate <- rocrateR::rocrate(person_rvd, organisation_uol) |> rocrateR::add_entity_value(id = "./", key = "author", value = list(`@id` = person_rvd$`@id`)) ``` ```{r} my_second_ro_crate ``` ## 4. Wrangle RO-Crate Previously, we covered how to include additional entities, other valid operations are to extract (`rocrateR::get_entity()`) and remove (`rocrateR::remove_entities()`). ### 4.1. Set up ```{r} # create basic RO-Crate basic_ro_crate <- rocrateR::rocrate() # create some entities for a project and datasets dataset_entities <- seq_len(2) |> lapply(\(x) rocrateR::entity(x, type = "Dataset", name = paste0("Data ", x))) project_entity <- rocrateR::entity( "#proj101", type = "Project", name = "Project 101", hasPart = dataset_entities |> lapply(\(x) list(`@id` = x[["@id"]])) ) # add project and entities to the RO-Crate basic_ro_crate <- basic_ro_crate |> rocrateR::add_entity(project_entity) |> # note that here we are using `rocrateR::add_entities` and `rocrateR::add_entity` rocrateR::add_entities(dataset_entities) basic_ro_crate ``` ### 4.2. Extract entity We can extract entities via the `@id`, `@type` or both: #### 4.2.1. Extract using `@id` ```{r} basic_ro_crate_project <- basic_ro_crate |> rocrateR::get_entity(id = "#proj101") basic_ro_crate_project ``` #### 4.2.2. Extract using `@type` ```{r} basic_ro_crate_datasets <- basic_ro_crate |> rocrateR::get_entity(type = "Dataset") basic_ro_crate_datasets ``` #### 4.2.3. Extract using `@id` and `@type` ```{r} basic_ro_crate_dataset_root <- basic_ro_crate |> rocrateR::get_entity(id = "./", type = "Dataset") basic_ro_crate_dataset_root ``` ### 4.3. Remove entity Similarly, we can remove entities from an RO-Crate: #### 4.3.1. Remove using scalar `@id` ```{r} basic_ro_crate_alt <- basic_ro_crate |> rocrateR::remove_entity("#proj101") ``` #### 4.3.2. Remove using `entity` object ```{r} basic_ro_crate_alt <- basic_ro_crate |> rocrateR::remove_entity(project_entity) ``` #### 4.3.3. Remove multiple entities ```{r} basic_ro_crate_alt <- basic_ro_crate |> rocrateR::remove_entity(dataset_entities) ``` ## 5. Create an RO-Crate Bag Here we will explore the BagIt file packaging format, which is the recommended to use for _bagging_ RO-Crates. BagIt is described in [RFC 8493](https://doi.org/10.17487/RFC8493): > [BagIt is] … a set of hierarchical file layout conventions for storage and transfer of arbitrary digital content. A "bag" has just enough structure to enclose descriptive metadata "tags" and a file "payload" but does not require knowledge of the payload’s internal semantics. This BagIt format is suitable for reliable storage and transfer. In this package, the function `rocrateR::bag_rocrate` will take either a `path` pointing to the root of an RO-Crate (must have at least an RO-Crate metadata descriptor file, `ro-crate-metadata.json`) or an RO-Crate object created with `rocrateR::rocrate` (and alternatives), as shown in step 1. For more details, run the following command: ```r ?rocrateR::bag_rocrate ``` ### 5.1. `rocrateR::bag_rocrate()` Here we will create an RO-Crate bag inside temporary directory: ```{r} # create basic RO-Crate basic_ro_crate <- rocrateR::rocrate() # create temporary directory tmp_dir <- file.path(tempdir(), paste0("rocrate-", digest::digest(basename(tempfile())))) dir.create(tmp_dir, showWarnings = FALSE, recursive = TRUE) # then, we can create the RO-Crate bag path_to_rocrate_bag <- basic_ro_crate |> rocrateR::bag_rocrate(path = tmp_dir) ``` ### 5.2. `rocrateR::is_rocrate_bag()` We can use the function `rocrateR::is_rocrate_bag()` to verify that a given path points to a ZIP file or a directory with a valid RO-Crate bag. The expected files are - `bagit.txt` with the BagIt [definition](https://www.rfc-editor.org/rfc/rfc8493.html#section-2.2.2) - `data` directory with [payload](https://www.rfc-editor.org/rfc/rfc8493.html#section-2.1.2) of the RO-Crate - `manifest-[algorithm].txt` with the checksum for each file inside the `data` directory; . ```{r} path_to_rocrate_bag |> rocrateR::is_rocrate_bag() ``` And then, the RO-Crate can be displayed ```{r} path_to_rocrate_bag |> rocrateR::load_rocrate() ``` ### 5.3. `rocrateR::unbag_rocrate()` We can explore the contents of the RO-Crate bag with the following commands: ```{r, echo=FALSE, eval=FALSE} # list files without unzipping unzip(path_to_rocrate_bag, list = TRUE) ``` ```{r} # extract files in temporary directory path_to_rocrate_bag_contents <- path_to_rocrate_bag |> rocrateR::unbag_rocrate(output = file.path(tmp_dir, "ROC")) # create tree with the files fs::dir_tree(path_to_rocrate_bag_contents) ``` ```{r} # delete temporary directory unlink(tmp_dir, recursive = TRUE, force = TRUE) ``` ## 6. Validation > Advanced validation using the Python `rocrate-validator` is optional and requires `{reticulate}`. # Appendix ## A1. Advanced Validation (experimental) As you develop your RO-Crates, you might want to validate them. There are few validators online (some of which can be found at https://www.researchobject.org/ro-crate/tools), here we will explore the Python package [`rocrate-validator`](https://github.com/crs4/rocrate-validator). For installation details, please visit https://github.com/crs4/rocrate-validator. `r knitr::asis_output("\U26A0")` The validation workflow depends on Python’s [`rocrate-validator`](https://github.com/crs4/rocrate-validator). Ensure you have a working Python installation and [`{reticulate}`](https://cran.r-project.org/package=reticulate) configured correctly (`reticulate::py_config()`). On Windows, you may need to restart R after installation. ### A1.1. Install [`{reticulate}`](https://cran.r-project.org/package=reticulate) ``` r pak::pkg_install("reticulate") ``` ### A1.2. Install [`rocrate-validator`](https://github.com/crs4/rocrate-validator) ``` r reticulate::py_install("roc-validator", env = "rocrateR") ``` ### A1.3. Create example RO-Crate and validate it ```{r, eval = interactive()} basic_ro_crate <- rocrateR::rocrate() # store crate inside temporary directory tmp <- file.path(tempdir(), "ro-crate-metadata.json") basic_ro_crate |> rocrateR::write_rocrate(tmp) # wrap crate into zip file (expected by validator) tmp_zip <- paste(tmp, ".zip") zip(tmp_zip, tmp) # validate (note the name of the module: rocrate_validator) reticulate::use_virtualenv("rocrateR") rocrate_validator <- reticulate::import("rocrate_validator") status <- rocrate_validator$utils$validate_rocrate_uri(tmp_zip) if (status) { message("RO-Crate is valid!") } else { message("RO-Crate is invalid!") } # delete temporary files unlink(tmp) unlink(tmp_zip) ```