--- title: "Introduction to fdth" author: "Faria, J. C.; Allaman, I. B.; Jelihovschi, E. G." date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to fdth} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 6, fig.height = 4, fig.align = "center" ) ``` ## Overview `fdth` builds **frequency distribution tables** (fdt) and their associated graphics from vectors, data frames, and matrices for both **numerical** and **categorical** variables. Core functions: | Function | Purpose | |---|---| | `fdt()` | Frequency table for numerical data | | `fdt_cat()` | Frequency table for categorical data | | `make.fdt()` | Reconstruct a table from frequencies alone | | `make.fdt_cat()` | Reconstruct a categorical table from frequencies | | `mfv()` | Most frequent value (mode) | | `sd()` / `var()` | Standard deviation / variance for grouped data | ```{r load} library(fdth) ``` --- ## 1. Numerical data — `fdt()` ### 1.1 Basic usage ```{r fdt-basic} set.seed(42) x <- rnorm(200, mean = 10, sd = 2) ft <- fdt(x) ft ``` The default table has six columns: | Column | Description | |---|---| | `Class limits` | Interval notation | | `f` | Absolute frequency | | `rf` | Relative frequency | | `rf(%)` | Relative frequency (%) | | `cf` | Cumulative frequency | | `cf(%)` | Cumulative frequency (%) | ### 1.2 Choosing the number of classes ```{r fdt-breaks} # Sturges (default) fdt(x, breaks = "Sturges") # Scott fdt(x, breaks = "Scott") # Freedman-Diaconis fdt(x, breaks = "FD") # Fixed number of classes fdt(x, k = 8) ``` ### 1.3 Custom interval boundaries ```{r fdt-custom} # Fixed start, end and width ft2 <- fdt(x, start = 4, end = 16, h = 2) ft2 ``` ### 1.4 Formatting class limits Use `format.classes = TRUE` together with `pattern` to control the number of decimal places displayed in the class limits: ```{r fdt-format} # Two decimal places print(ft, format.classes = TRUE, pattern = "%.2f") # Summary with the same formatting summary(ft, format.classes = TRUE, pattern = "%.2f") ``` ### 1.5 Right-closed intervals By default intervals are left-closed `[a, b)`. Use `right = TRUE` for right-closed `(a, b]`: ```{r fdt-right} fdt(x, right = TRUE) ``` ### 1.6 Missing values ```{r fdt-na} x_na <- c(x, NA, NA) # This errors by design: tryCatch(fdt(x_na), error = function(e) message("Error: ", e$message)) # Remove NAs explicitly: fdt(x_na, na.rm = TRUE) ``` --- ## 2. Plots — `plot.fdt.default()` All plot types are selected with the `type` argument. ### 2.1 Absolute frequency histogram and polygon ```{r plot-fh-fp, fig.show="hold", out.width="48%"} plot(ft, type = "fh", main = "Frequency histogram") plot(ft, type = "fp", main = "Frequency polygon") ``` ### 2.2 Relative frequency (proportion and percentage) ```{r plot-rf, fig.show="hold", out.width="48%"} plot(ft, type = "rfh", main = "Relative frequency histogram") plot(ft, type = "rfph", main = "Relative frequency (%) histogram") ``` ### 2.3 Density ```{r plot-density} plot(ft, type = "d", main = "Density histogram") ``` ### 2.4 Cumulative frequency ```{r plot-cf, fig.show="hold", out.width="48%"} plot(ft, type = "cfp", main = "Cumulative frequency polygon") plot(ft, type = "cfpp", main = "Cumulative frequency (%) polygon") ``` ### 2.5 Value labels on bars ```{r plot-labels} plot(ft, type = "fh", v = TRUE, v.round = 0, main = "Histogram with counts") ``` --- ## 3. Summary statistics from grouped data Once an `fdt` object exists, the usual statistics can be computed directly from the **grouped** (tabulated) data — no access to the original vector is needed. ```{r stats} ft3 <- fdt(x) mean(ft3) median(ft3) mfv(ft3) # mode(s) var(ft3) sd(ft3) # Quartiles (default) quantile(ft3) # Deciles quantile(ft3, i = 1:9, probs = seq(0, 1, 0.1)) ``` --- ## 4. Multiple numerical variables — `fdt.data.frame()` When the input is a **data frame** or **matrix**, `fdt()` builds one table per numeric column and returns an `fdt.multiple` object. ### 4.1 All numeric columns ```{r fdt-df} ft_iris <- fdt(iris[, 1:4]) ft_iris ``` ### 4.2 Grouped by a factor Use the `by` argument to stratify each numeric variable by a categorical column: ```{r fdt-by} ft_by <- fdt(iris[, c(1, 2, 5)], k = 5, by = "Species") ft_by ``` ### 4.3 Plotting multiple tables ```{r plot-multiple, fig.width=8, fig.height=6} plot(ft_iris, type = "fh") ``` ### 4.4 Statistics on multiple tables ```{r stats-multiple} mean(ft_iris) ``` --- ## 5. Categorical data — `fdt_cat()` ### 5.1 Basic usage ```{r fdt-cat-basic} set.seed(7) fruits <- sample(c("apple", "banana", "cherry", "strawberry", "melon"), size = 150, replace = TRUE) ft_cat <- fdt_cat(fruits) ft_cat ``` By default the table is sorted by descending frequency. ### 5.2 Preserving natural order ```{r fdt-cat-nosort} fdt_cat(fruits, sort = FALSE) ``` ### 5.3 Formatting ```{r fdt-cat-format} print(ft_cat, round = 3) ``` ### 5.4 Plots for categorical data ```{r plot-cat-bar} plot(ft_cat, type = "fb", main = "Frequency bar chart") ``` ```{r plot-cat-dotchart} plot(ft_cat, type = "fd", main = "Frequency dotchart") ``` ```{r plot-cat-pareto} plot(ft_cat, type = "pa", main = "Pareto chart") ``` --- ## 6. Reconstructing a table from frequencies If the original data is no longer available but the frequency table is known, `make.fdt()` and `make.fdt_cat()` rebuild complete `fdt` objects. ```{r make-fdt} # Numerical ft_ref <- fdt(x) ft_new <- make.fdt(f = ft_ref$table$f, start = ft_ref$breaks["start"], end = ft_ref$breaks["end"]) print(ft_new, format.classes = TRUE, pattern = "%.2f") ``` ```{r make-fdt-cat} # Categorical ft_new_cat <- make.fdt_cat(f = ft_cat$f, categories = ft_cat$Category) ft_new_cat ``` --- ## 7. LaTeX export For publication-ready LaTeX tables use `xtable::xtable()` on any `fdt` object. A dedicated vignette covers this workflow in detail: ```{r xtable-ref, eval=FALSE} vignette("latex_fdt", package = "fdth") ``` --- ## Session information ```{r session} sessionInfo() ```