---
title: "tlsR Workflow: From Raw Imaging Data to TLS Characterisation"
author: "Ali Amiryousefi"
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
vignette: >
  %\VignetteIndexEntry{tlsR Workflow: From Raw Imaging Data to TLS Characterisation}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse  = TRUE,
  comment   = "#>",
  fig.width = 6,
  fig.height = 5,
  eval = TRUE
)
```

## Introduction

Tertiary lymphoid structures (TLS) are ectopic lymphoid organs that form in
non-lymphoid tissues — most notably in tumours — and are associated with
improved patient outcomes and immunotherapy response.  **tlsR** provides a
fast, reproducible pipeline for detecting TLS and characterising their spatial
organisation in multiplexed tissue imaging data (e.g. mIHC, CODEX, IMC).

The core pipeline is:

```
Raw ldata list
     │
     ▼
detect_TLS()        ← KNN-based B+T co-localisation
     │
     ├──► scan_clustering()   ← Optional: local Ripley's L
     │
     ├──► calc_icat()         ← ICAT linearity score per TLS
     │
     ├──► detect_tic()        ← T-cell clusters outside TLS
     │
     ├──► summarize_TLS()     ← Tidy summary table
     │
     └──► plot_TLS()          ← Publication-ready spatial plot
```

---

## Data Format

`tlsR` expects a **named list of data frames** (`ldata`), one element per
tissue sample.  Each data frame must contain at minimum:

| Column      | Type      | Description                                      |
|-------------|-----------|--------------------------------------------------|
| `x`         | numeric   | X coordinate in microns                          |
| `y`         | numeric   | Y coordinate in microns                          |
| `phenotype` | character | Cell label; must contain `"B cell"` / `"T cell"` |

Additional columns (e.g. cell area, marker intensities) are silently ignored.

```{r load-data}
library(tlsR)

data(toy_ldata)

# Structure of the built-in example dataset
str(toy_ldata)
table(toy_ldata[["ToySample"]]$phenotype)
```

---

## Step 1 — Detect TLS with `detect_TLS()`

`detect_TLS()` identifies B-cell-rich regions with sufficient T-cell
co-localisation using a KNN density approach.

```{r detect-tls}
# Ensure toy data has expected columns for the new validation
data(toy_ldata)
if (!"phenotype" %in% names(toy_ldata[["ToySample"]])) {
  toy_ldata[["ToySample"]]$phenotype <- toy_ldata[["ToySample"]]$coarse_phen_vec   # or whatever the correct mapping is
}
ldata <- detect_TLS(
  LSP                     = "ToySample",
  k                       = 30,     # neighbours for density estimation
  bcell_density_threshold = 15,     # min avg 1/k-distance (um)
  min_B_cells             = 50,     # min B cells per candidate TLS
  min_T_cells_nearby      = 30,     # min T cells within max_distance_T
  max_distance_T          = 50,     # search radius (um)
  ldata                   = toy_ldata
)

table(ldata[["ToySample"]]$tls_id_knn)
```

The new column `tls_id_knn` is `0` for non-TLS cells and a positive integer
for cells assigned to TLS 1, 2, 3, … .

### Quick base-R check plot

```{r base-plot, fig.alt="Scatter plot of ToySample cells coloured by TLS membership"}
df  <- ldata[["ToySample"]]
col <- ifelse(df$tls_id_knn == 0, "grey80",
              c("#0072B2", "#009E73", "#CC79A7")[df$tls_id_knn])
plot(df$x, df$y,
     col  = col, pch = 19, cex = 0.3,
     xlab = "x (um)", ylab = "y (um)",
     main = "Detected TLS — ToySample")
legend("topright",
       legend = c("Background", paste0("TLS ", sort(unique(df$tls_id_knn[df$tls_id_knn > 0])))),
       col    = c("grey80", "#0072B2", "#009E73", "#CC79A7"),
       pch    = 19, pt.cex = 1.2, bty = "n")
```

---

## Step 2 — Local Ripley's L with `scan_clustering()` (Optional)

`scan_clustering()` slides a square window across the tissue and tests for
statistically significant immune cell clustering using Ripley's L with a
Monte Carlo CSR envelope.

```{r scan, eval = FALSE}
# eval=FALSE because this step can take ~10–30 s on real data
windows <- scan_clustering(
  ws        = 500,          # window side (um)
  sample    = "ToySample",
  phenotype = "B cells",
  nsim      = 39,           # Monte Carlo simulations (39 → p < 0.05)
  plot      = FALSE,
  ldata     = ldata
)

cat("Significant windows:", length(windows), "\n")
# Access the first window's centre and cell count:
if (length(windows) > 0) {
  cat("Centre:", windows[[1]]$window_center, "\n")
  cat("Cells: ", windows[[1]]$n_cells, "\n")
}
```

---

## Step 3 — ICAT Score with `calc_icat()`

The **ICAT (Immune Cell Arrangement Trace)** index quantifies how linearly
organised cells are within a TLS.  A higher value indicates a more structured
(germinal-centre-like) arrangement.

```{r icat}
n_tls <- max(ldata[["ToySample"]]$tls_id_knn, na.rm = TRUE)

if (n_tls >= 1) {
  icat_scores <- vapply(
    seq_len(n_tls),
    function(id) calc_icat("ToySample", tlsID = id, ldata = ldata),
    numeric(1)
  )
  names(icat_scores) <- paste0("TLS", seq_len(n_tls))
  print(icat_scores)
}
```

`calc_icat()` returns `NA` (with a message) if a TLS has too few cells or if
FastICA fails to converge — no errors are thrown.

---

## Step 4 — Detect T-cell Clusters with `detect_tic()`

T-cell clusters (TIC) that lie *outside* TLS are identified with HDBSCAN.
The `min_pts` and `min_cluster_size` arguments let you control sensitivity.

```{r detect-tic}
ldata <- detect_tic(
  sample           = "ToySample",
  min_pts          = 10,   # HDBSCAN minPts
  min_cluster_size = 10,   # drop clusters smaller than this
  ldata            = ldata
)

table(ldata[["ToySample"]]$tcell_cluster_hdbscan, useNA = "ifany")
```

---

## Step 5 — Summary Table with `summarize_TLS()`

`summarize_TLS()` produces a tidy one-row-per-sample summary — convenient for
downstream statistical analysis.

```{r summary}
sumtbl <- summarize_TLS(ldata, calc_icat_scores = FALSE)
print(sumtbl)
```

With `calc_icat_scores = TRUE` a list-column `icat_scores` is appended
containing named numeric vectors of per-TLS ICAT values.

---

## Step 6 — Visualise with `plot_TLS()`

`plot_TLS()` produces a ggplot2 scatter plot with TLS and TIC coloured
distinctly using a colourblind-friendly palette.

```{r plot-tls, fig.alt="ggplot2 spatial map of ToySample with TLS and TIC highlighted"}
p <- plot_TLS(
  sample     = "ToySample",
  ldata      = ldata,
  show_tic   = TRUE,
  point_size = 0.5,
  alpha      = 0.7
)
```

The returned `ggplot` object can be further customised with standard ggplot2
functions:

```{r plot-custom, fig.alt="Customised TLS plot with dark theme"}
library(ggplot2)
p + theme_dark() + labs(title = "ToySample — dark theme")
```

---

## Multi-Sample Workflow

`tlsR` is designed to scale naturally to many samples.  Simply pass your
full `ldata` list and iterate:

```{r multi-sample, eval = FALSE}
samples <- names(ldata)

ldata <- Reduce(function(ld, s) detect_TLS(s, ldata = ld), samples, ldata)
ldata <- Reduce(function(ld, s) detect_tic(s,  ldata = ld), samples, ldata)

summary_all <- summarize_TLS(ldata)
print(summary_all)
```

---

## Session Info

```{r session}
sessionInfo()
```