--- title: "Calculating LOD from Olink® Explore data" output: html_vignette: toc: true toc_depth: 3 includes: in_header: ../man/figures/logo.html vignette: > %\VignetteIndexEntry{Calculating LOD from Olink® Explore data} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} date: 'Compiled: `r format(Sys.Date(), "%B %d, %Y")`' editor_options: markdown: wrap: 72 --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", tidy = FALSE, tidy.opts = list(width.cutoff = 95), fig.width = 6, fig.height = 3, message = FALSE, warning = FALSE, time_it = TRUE, fig.align = "center" ) ``` ## Introduction This tutorial describes how to use Olink^®^ Analyze to integrate Limit of Detection (LOD) into Olink^®^ Explore HT, Olink^®^ Reveal, and Olink^®^ Explore 384/3072 datasets. Although it is recommended to use all Olink Explore and Olink Reveal data in downstream analyses, LOD information can be useful when performing technical evaluations of a dataset. In this tutorial, you will learn how to use `olink_lod()` to add LOD information to your Olink Explore or Olink Reveal dataset. Note that Olink Analyze does not contain example Olink Explore HT, Olink Reveal, or Olink Explore 384/3072 datasets within the package, so external data will be necessary for the code below to work. The external data should contain internal and external controls for proper calculation and normalization. All file paths should be replaced with a path to your data and fixed LOD reference file (if applicable). ## Integrating LOD Limit of Detection (LOD) is a metric that indicates the lowest measurable value of a protein. LOD can be helpful when performing technical evaluations of NPX™ datasets, such as calculating CVs. As a note, LOD is less important in downstream statistical analyses as values under LOD typically converge across groups. As such, including data below LOD is unlikely to increase the risk of false positive discoveries. Furthermore, data below LOD can be instrumental in downstream analyses such as biomarker discovery as a protein may be well expressed in one group and not measured in another group. In this case, this protein can be a strong biomarker candidate for specific groups. LOD can be added to Olink Explore or Olink Reveal NPX datasets using `olink_lod()`. This function can calculate LOD from an NPX dataset using the dataset's negative controls or a list of predetermined fixed LOD values (available in the Document Download Center at [olink.com](https://olink.com/knowledge/documents)). As the default setting, `olink_lod()` will calculate LOD using a dataset's negative controls. Olink Explore and Olink Reveal data are delivered as either plate control (PC) normalized or intensity normalized (the normalization type employed is indicated in the NPX file column Normalization), where the latter is dependent on the assumption that the analyzed samples are randomized. These are reported in the two respective columns, NPX and PCNormalizedNPX. Please notice that for PC normalized datasets the content in these two columns will be identical, while for intensity normalized datasets the NPX column will include the intensity normalized values. Similarly, the `olink_lod()` function adds two columns to your dataset; LOD and PCNormalizedLOD, respectively. For a PC normalized dataset, the content in these two columns will be identical, while for an intensity normalized dataset the LOD column will contain intensity normalized LOD values. Examples of results for plate control and intensity normalization are shown in the tables below. ```{r, echo = FALSE} set.seed(1234) # Load example dataset (npx_data1) npx_data1 <- OlinkAnalyze::npx_data1 # NPX file preprocessing ## Generate check log check_log_npx_data1 <- OlinkAnalyze::check_npx( df = npx_data1 ) ## Clean NPX npx_data1_clean <- OlinkAnalyze::clean_npx( df = npx_data1, check_log = check_log_npx_data1 ) ## Generate check log on cleaned data check_log_npx_data1_clean <- OlinkAnalyze::check_npx( df = npx_data1_clean ) table1 <- npx_data1_clean |> dplyr::slice_head( n = 6L ) |> dplyr::select( -dplyr::all_of( c("Index", "MissingFreq", "Panel_Version", "QC_Warning", "Subject", "Treatment", "Site", "Time", "Project", "Panel", "PlateID") ) ) |> dplyr::mutate( Count = round( x = .data[["NPX"]] * (100L + sample(x = seq(from = -5L, to = 15L), size = 1L)) ) ) |> dplyr::mutate( SampleType = "SAMPLE" ) |> dplyr::mutate( Normalization = "Plate control" ) |> dplyr::mutate( NPX = round(x = .data[["NPX"]], digits = 2L) ) |> dplyr::mutate( LOD = round(x = .data[["LOD"]], digits = 2L) ) |> dplyr::mutate( PCNormalizedNPX = .data[["NPX"]] ) |> dplyr::mutate( PCNormalizedLOD = .data[["LOD"]] ) |> dplyr::select( dplyr::all_of( c("SampleID", "SampleType", "OlinkID", "UniProt", "Assay", "Count", "NPX", "PCNormalizedNPX", "Normalization", "LOD", "PCNormalizedLOD") ) ) table1 |> knitr::kable( caption = "Example results from Plate Control Normalized Project" ) |> kableExtra::kable_styling( font_size = 10L ) table1 |> dplyr::mutate( Normalization = "Intensity" ) |> dplyr::mutate( NPX = round(x = .data[["NPX"]] + 4.16, digits = 2L) ) |> dplyr::mutate( LOD = round(x = .data[["LOD"]] + 4.16, digits = 2L) ) |> dplyr::select( dplyr::all_of( c("SampleID", "SampleType", "OlinkID", "UniProt", "Assay", "Count", "NPX", "PCNormalizedNPX", "Normalization", "LOD", "PCNormalizedLOD") ) ) |> knitr::kable( caption = "Example results from Intensity Normalized Project" ) |> kableExtra::kable_styling( font_size = 10 ) ``` ## Import Olink NGS datasets Olink Next Generation Sequencing (NGS) datasets are standard Olink Explore HT, Olink Reveal, and Olink Explore 384/3072 NPX tables. The `read_NPX()` function can be used to import an NPX file in parquet form as generated by Olink Software. More information on using `read_NPX()` can be found in the [Olink Analyze Overview tutorial](https://CRAN.R-project.org/package=OlinkAnalyze). ```{r npx_data_for_lod, eval = FALSE, message = FALSE, warning = FALSE} # Preprocessing steps for both fixed LOD and Negative Control LOD integration ## Load NPX file df_npx <- OlinkAnalyze::read_NPX( filename = "Path_to/Explore_NPX_file.parquet" ) ## Check NPX data check_log_df_npx <- OlinkAnalyze::check_npx( df = df_npx ) ## Clean NPX data df_npx_clean <- OlinkAnalyze::clean_npx( df = df_npx, check_log = check_log_df_npx, # do not remove controls or warnings to ensure LOD can be calculated correctly remove_control_assay = FALSE, remove_control_sample = FALSE, remove_assay_warning = FALSE, remove_qc_warning = FALSE ) ## Generate check log on cleaned data check_log_df_npx_clean <- OlinkAnalyze::check_npx( df = df_npx_clean ) # Cleanup intermediate objects rm( df_npx, check_log_df_npx ) ``` ------------------------------------------------------------------------ #### *A note on calculating LOD with clean_npx()* *When the function `clean_npx()` is used before calculating LOD, one should disable exclusion of internal controls, external controls, assay warnings and sample warnings to ensure that the LOD values are calculated correctly. This can be done by setting the following parameters to FALSE in the `clean_npx()` function: `remove_control_assay`, `remove_control_sample`, `remove_assay_warning` and `remove_qc_warning`. This should be done regardless of the LOD method that is used (fixed LOD or Negative Control LOD).* *By default, `clean_npx()` removes both internal and external controls, which are required to calculate count-based LOD values. If `olink_lod()` is run without the internal or external controls present, the function will still execute, but all count-based LODs will be assigned `NA` values. In addition, the `olink_lod()` function contains code to exclude only the relevant assay and QC warnings for the LOD calculation, so these warnings do not need to be removed for LOD to be calculated.* *The function `clean_npx()` can be applied with default settings after LOD has been calculated to remove controls and warnings from the dataset and prepare it for downstream analyses.* *More information on `clean_npx()` can be found in the [Olink Analyze Overview tutorial](https://CRAN.R-project.org/package=OlinkAnalyze).* ------------------------------------------------------------------------ ## Integrating Negative Control LOD The negative control (NC) LOD method requires at least 10 negative controls in a dataset. Negative control data is available in the standard exported Olink NGS NPX parquet files. NCs can be identified through the SampleID and SampleType columns. A negative control will not contribute to the minimum number of required NCs if the negative control does not pass sample QC criteria (sample QC failure or warning) in all of the data (i.e. all datapoints measured for that sample). Negative controls are used to calculate LOD from either PC normalized NPX or counts. For assays with more than 150 counts in one of the negative controls, LOD is calculated using the median PC normalized NPX and adding 3 standard deviations, or 0.2 NPX whichever is larger. For assays with fewer than 150 counts in all negative controls, LOD is calculated using the count values which are then converted into PC normalized NPX. ------------------------------------------------------------------------ #### *A note on calculating LOD from counts* *Some assays will use count values as the LOD because the assay receives very few counts in the negative controls. For the convenience of data processing, the LOD in count values are converted to NPX values in the `olink_lod()` function. The LOD value for this assay (in counts) will become many LOD values in NPX (as extension control counts will vary across all samples). This is due to the fact that minor changes on the counts scale can result in significant changes on the NPX scale when working with small counts. The reason for this is that NPX is a relative scale, which is calculated by dividing the counts of the assay by the counts of the extension control. For example, given that the extension control values remain constant, if a count value were to change from 1 count to 2 counts, this would be a change of 1 NPX, while a change from 1000 counts to 1001 counts would be negligible on the NPX scale.* *Furthermore, due to the low number of counts, the NPX values calculated from these counts do not correlate to true background levels. The converted NPX values should not be used as LOD values for these assays.* ------------------------------------------------------------------------ The resulting LOD is the PC normalized negative control LOD. In the event that the Olink NGS dataset is intensity normalized, an intensity normalization adjustment factor is applied and the resulting intensity normalized LOD is reported in the LOD column and the PC normalized LOD is reported in the PCNormalizedLOD column. ```{r NCLOD_example, eval = FALSE, message = FALSE, warning = FALSE} # Calculate LOD from negative controls ## Calculate LOD df_npx_nc_lod <- OlinkAnalyze::olink_lod( data = df_npx_clean, lod_method = "NCLOD", check_log = check_log_df_npx_clean ) ## Generate check log on data with LOD check_log_df_npx_nc_lod <- OlinkAnalyze::check_npx( df = df_npx_nc_lod ) ## Clean NPX data with LOD df_npx_nc_lod_clean <- OlinkAnalyze::clean_npx( df = df_npx_nc_lod, check_log = check_log_df_npx_nc_lod ) ## Generate check log on cleaned data with LOD check_log_df_npx_nc_lod_clean <- OlinkAnalyze::check_npx( df = df_npx_nc_lod_clean ) # Cleanup intermediate objects rm( df_npx_nc_lod, check_log_df_npx_nc_lod ) # Rename final cleaned data with LOD for clarity df_npx_nc_lod <- df_npx_nc_lod_clean check_log_df_npx_nc_lod <- check_log_df_npx_nc_lod_clean rm( df_npx_nc_lod_clean, check_log_df_npx_nc_lod_clean ) ``` ## Integrating Fixed LOD The fixed LOD method uses fixed LOD values that have been calculated on negative controls used in Olink reference runs using the method described above for negative control LOD. These values are specific to the Data Analysis Reference ID, which can be found in your dataset. The fixed LOD data is available in an external CSV file which can be downloaded from the Document Download Center at [olink.com](https://olink.com/knowledge/documents). The fixed LOD values reported in this CSV file are the PC normalized LODs. The fixed LOD file is read into the `olink_lod()` function to be integrated into an Olink NGS dataset. In the event that the NGS dataset is intensity normalized, an intensity normalization adjustment factor is applied and the resulting intensity normalized LOD is reported in the LOD column and the PC normalized LOD is reported in the PCNormalizedLOD column. ```{r FixedLOD, eval = FALSE, message = FALSE, warning = FALSE} # Integrating fixed LOD ## Fixed LOD file path fixedlod_filepath <- "Path_to/ExploreHT_fixedLOD.csv" ## Calculate LOD df_npx_fixed_lod <- OlinkAnalyze::olink_lod( data = df_npx_clean, lod_method = "FixedLOD", lod_file_path = fixedlod_filepath, check_log = check_log_df_npx_clean ) ## Generate check log on data with LOD check_log_df_npx_fixed_lod <- OlinkAnalyze::check_npx( df = df_npx_fixed_lod ) ## Clean NPX data with LOD df_npx_fixed_lod_clean <- OlinkAnalyze::clean_npx( df = df_npx_fixed_lod, check_log = check_log_df_npx_fixed_lod ) ## Generate check log on cleaned data with LOD check_log_npx_fixed_lod_clean <- OlinkAnalyze::check_npx( df = df_npx_fixed_lod_clean ) # Cleanup intermediate objects rm( df_npx_fixed_lod, check_log_df_npx_fixed_lod ) # Rename final cleaned data with LOD for clarity df_npx_fixed_lod <- df_npx_fixed_lod_clean check_log_df_npx_fixed_lod <- check_log_npx_fixed_lod_clean rm( df_npx_fixed_lod_clean, check_log_npx_fixed_lod_clean ) ``` ## When to Use Fixed LOD vs NC LOD For smaller sized studies (\<10 NCs) we recommend using fixed LOD to integrate LOD values into your NPX dataset, as LOD calculations on fewer NCs may provide non-accurate values. However, it is important to keep in mind that fixed LOD values are not specific to your project, rather these values are generated by Olink when a new lot of reagents is released. For larger projects we recommend calculating LOD from NC to obtain LOD values that are specific to your project. However, this requires that the dataset has at least 10 NCs with passing SampleQC. ## Integrating Both NC LOD and Fixed LOD There is also the option to calculate both NC LOD and fixed LOD for a data file by setting `lod_method` to “Both”. The resulting data will have 4 additional columns, starting with NC or Fixed to indicate the method used to calculate LOD, followed by LOD or PCNormalizedLOD as explained above. An example of the file format is shown below. Note that these columns will not automatically be recognized by other functions within Olink Analyze that use LOD (for example `olink_bridgeselector()`). To use these functions, the LOD value to be used should have "LOD" as the column name. ```{r, echo = FALSE} table1 |> dplyr::mutate( Normalization = "Intensity" ) |> dplyr::mutate( PCNormalizedNPX = round(x = .data[["NPX"]], digits = 2L) ) |> dplyr::mutate( PCNormalizedLOD = round(x = .data[["LOD"]], digits = 2L) ) |> dplyr::mutate( NPX = round(.data[["NPX"]] + 4.16, digits = 2L) ) |> dplyr::mutate( LOD = round(.data[["LOD"]] + 4.16, digits = 2L) ) |> dplyr::rename( "FixedLOD" = "LOD", "FixedPCNormalizedLOD" = "PCNormalizedLOD" ) |> dplyr::mutate( NCLOD = .data[["FixedLOD"]] - 2.34, NCPCNormalizedLOD = .data[["FixedPCNormalizedLOD"]] - 2.34 ) |> dplyr::select( dplyr::all_of( c("SampleID", "SampleType", "OlinkID", "UniProt", "Assay", "Count", "NPX", "Normalization", "PCNormalizedNPX", "FixedLOD", "FixedPCNormalizedLOD", "NCLOD", "NCPCNormalizedLOD") ) ) |> knitr::kable( caption = "Example results using both LOD calculation methods" ) |> kableExtra::kable_styling( font_size = 10L ) ``` ## Adjusting LOD for Intensity Normalized Data If an Olink NGS dataset is intensity normalized, a normalization adjustment factor is applied to the PC normalized LOD within the `olink_lod()` function. For each assay, this adjustment factor is calculated as the median NPX of all samples (excluding Olink's external controls) within each plate. For Olink Explore 3072, overlapping assays are assessed separately, within their respective panels. The intensity normalized negative control LOD is calculated by subtracting this adjustment factor from the PC normalized negative control LOD. The intensity normalization LOD adjustment is applied to both the negative control and fixed LOD methods. ## Handling LOD in Bridge-Normalized Data When using the `olink_normalization_bridge()` function to bridge two datasets, the reference project remains unchanged throughout the bridging procedure, including the NC LOD (calculated from negative controls in the reference project) and the fixed LOD. In contrast, the LOD values (both NC LOD and fixed LOD) for the non-reference project are adjusted using the same adjustment factor that is applied to all other samples for the corresponding assay. This adjustment factor is the median of the paired NPX differences per assay between the bridging samples. Consequently, for **within-product bridging**, LOD values from both the reference and non-reference projects can be used for downstream analysis. In contrast, for **between-product bridging**, differences in assay bridgeability between products and the use of distinct normalization methods (median centering versus quantile smoothing) should be taken into account. Therefore, we recommend applying the LOD values from the reference project to the non-reference project in between-product bridging. ## Export Olink NGS Data with LOD Olink NGS data with LOD data can be exported using `arrow::write_parquet()` to export the data as a parquet file in long format. ```{r explore_npx_export, eval = FALSE, message = FALSE, warning = FALSE} # Exporting Olink NGS data with LOD information as a parquet file ## Integrate both Negative Control LOD and fixed LOD before NPX preprocessing df_npx_both_lod <- OlinkAnalyze::olink_lod( data = df_npx_clean, lod_file_path = fixedlod_filepath, lod_method = "Both", check_log = check_log_df_npx_clean ) ## Generate check log check_log_df_npx_both_lod <- OlinkAnalyze::check_npx( df = df_npx_both_lod ) ## Clean NPX df_npx_both_lod_clean <- OlinkAnalyze::clean_npx( df = df_npx_both_lod, check_log = check_log_df_npx_both_lod ) ## Generate check log on cleaned data check_log_npx_both_lod_clean <- OlinkAnalyze::check_npx( df = df_npx_both_lod_clean ) # Cleanup intermediate objects rm( df_npx_both_lod, check_log_df_npx_both_lod ) # Rename final cleaned data with both LOD for clarity df_npx_both_lod <- df_npx_both_lod_clean check_log_df_npx_both_lod <- check_log_npx_both_lod_clean rm( df_npx_both_lod_clean, check_log_npx_both_lod_clean ) # Add metadata for export df_npx_both_lod_arrow <- df_npx_both_lod |> arrow::as_arrow_table() df_npx_both_lod_arrow$metadata$FileVersion <- "NA" df_npx_both_lod_arrow$metadata$ExploreVersion <- "NA" df_npx_both_lod_arrow$metadata$ProjectName <- "NA" df_npx_both_lod_arrow$metadata$SampleMatrix <- "NA" df_npx_both_lod_arrow$metadata$DataFileType <- "Olink Analyze Export File" # One of "ExploreHT", "Explore3072", or "Reveal" df_npx_both_lod_arrow$metadata$ProductType <- "ExploreHT" # # "ExploreHT", "Explore3072", or "Reveal" df_npx_both_lod_arrow$metadata$Product <- "ExploreHT" arrow::write_parquet( x = df_npx_both_lod_arrow, sink = "path_to_output.parquet" ) ``` ## Contact Us We are always happy to help. Email us with any questions: - biostat\@olink.com for statistical services and general stats questions - support\@olink.com for Olink lab product and technical support - info\@olink.com for more information ## Legal Disclaimer © `r format(Sys.Date(), "%Y")` Olink Proteomics AB, part of Thermo Fisher Scientific. Olink products and services are For Research Use Only. Not for use in diagnostic procedures. All information in this document is subject to change without notice. This document is not intended to convey any warranties, representations and/or recommendations of any kind, unless such warranties, representations and/or recommendations are explicitly stated. Olink assumes no liability arising from a prospective reader’s actions based on this document. OLINK, NPX, PEA, PROXIMITY EXTENSION, INSIGHT and the Olink logotype are trademarks registered, or pending registration, by Olink Proteomics AB. All third-party trademarks are the property of their respective owners. Olink products and assay methods are covered by several patents and patent applications [https://www.olink.com/patents/](https://olink.com/patents/).